huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.22k stars 5.22k forks source link

Add GGUF loader for FluxTransformer2DModel #9487

Open vladmandic opened 2 hours ago

vladmandic commented 2 hours ago

GGUF is becoming a preferred means of distribution of FLUX fine-tunes.

Transformers recently added general support for GGUF and are slowly adding support for additional model types. (implementation is by adding gguf_file param to from_pretrained method)

This PR adds support for loading GGUF files to T5EncoderModel. I've tested the code with quants available at https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main and its working with current Flux implementation in diffusers.

However, as FluxTransformer2DModel is defined in diffusers library, support has to be added here to be able to load actual transformer model which is most (if not all) of Flux finetunes.

Examples that can be used:

cc: @yiyixuxu @sayakpaul @DN6

sayakpaul commented 2 hours ago

Perhaps after #9213.

Note that exotic FPX schemes are already supported (FP6, FP5, FP4) with torchao. Check out this repo for that: https://github.com/sayakpaul/diffusers-torchao

vladmandic commented 1 hour ago

yes, i'm following that pr closely :) also, torchao work makes all this easier. request here is not to reimplement any of the quantization work done so far, but to add diffusers equivalent of transformers.modeling_gguf_pytorch_utils.load_gguf_checkpoint() which returns state_dict (with key re-mapping as needed) and then the rest of the load can be as-is.

sayakpaul commented 1 hour ago

Yeah for sure. Thanks for following along!