Add GGUF loader for FluxTransformer2DModel

vladmandic commented 2 hours ago

GGUF is becoming a preferred means of distribution of FLUX fine-tunes.

Transformers recently added general support for GGUF and are slowly adding support for additional model types. (implementation is by adding gguf_file param to from_pretrained method)

This PR adds support for loading GGUF files to T5EncoderModel. I've tested the code with quants available at https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main and its working with current Flux implementation in diffusers.

However, as FluxTransformer2DModel is defined in diffusers library, support has to be added here to be able to load actual transformer model which is most (if not all) of Flux finetunes.

Examples that can be used:

https://civitai.com/models/657607/gguf-fastflux-flux1-schnell-merged-with-flux1-dev
with weights quantized as q4_0, q4_1, q5_0, q5_1
https://civitai.com/models/662958/flux1-dev-gguf-f16
with weights simply converted from f16

cc: @yiyixuxu @sayakpaul @DN6

sayakpaul commented 2 hours ago

Perhaps after #9213.

Note that exotic FPX schemes are already supported (FP6, FP5, FP4) with torchao. Check out this repo for that: https://github.com/sayakpaul/diffusers-torchao

vladmandic commented 1 hour ago

yes, i'm following that pr closely :) also, torchao work makes all this easier. request here is not to reimplement any of the quantization work done so far, but to add diffusers equivalent of transformers.modeling_gguf_pytorch_utils.load_gguf_checkpoint() which returns state_dict (with key re-mapping as needed) and then the rest of the load can be as-is.

sayakpaul commented 1 hour ago

Yeah for sure. Thanks for following along!

huggingface / diffusers

Add GGUF loader for FluxTransformer2DModel #9487