Open vladmandic opened 2 hours ago
Perhaps after #9213.
Note that exotic FPX schemes are already supported (FP6, FP5, FP4) with torchao. Check out this repo for that: https://github.com/sayakpaul/diffusers-torchao
yes, i'm following that pr closely :)
also, torchao work makes all this easier. request here is not to reimplement any of the quantization work done so far, but to add diffusers
equivalent of transformers.modeling_gguf_pytorch_utils.load_gguf_checkpoint()
which returns state_dict
(with key re-mapping as needed) and then the rest of the load can be as-is.
Yeah for sure. Thanks for following along!
GGUF is becoming a preferred means of distribution of FLUX fine-tunes.
Transformers recently added general support for GGUF and are slowly adding support for additional model types. (implementation is by adding
gguf_file
param tofrom_pretrained
method)This PR adds support for loading GGUF files to
T5EncoderModel
. I've tested the code with quants available at https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main and its working with current Flux implementation in diffusers.However, as
FluxTransformer2DModel
is defined in diffusers library, support has to be added here to be able to load actual transformer model which is most (if not all) of Flux finetunes.Examples that can be used:
with weights quantized as q4_0, q4_1, q5_0, q5_1
with weights simply converted from f16
cc: @yiyixuxu @sayakpaul @DN6