Closed emil-malina closed 3 weeks ago
I observe this as well. Different initialization ways are leading to slightly different results, even with the same seed.
Could anyone suggest a proper way to instantiate the Flux model, especially when we load LoRA?
Hi, maybe it's because of the T5, it has a problem when casting the weights, related to #8604.
Maybe you can try doing an experiment with the T5 prompt empty and compare the results.
The proper way to load the model is with the from_pretrained
. Loading the modules in a separate way is for training or for more specific needs where you need to have them separated.
@asomoza yes indeed! Especially if you notice option2 is always better than option1?
@asomoza @yiyixuxu Will this be resolved if #8604 is resolved?
However, when using Flux, there is always a limitation of 77 tokens in length.
Will this be resolved if https://github.com/huggingface/diffusers/issues/8604 is resolved?
Yes but it's not a diffusers problem, this is a transformer model, so the fix should come from them, the problem is that this doesn't seem to affect the text inference.
So the best fix right now is that you load the T5 using the official recommended method.
However, when using Flux, there is always a limitation of 77 tokens in length.
This isn't related to this issue, the limitation comes from the CLIP model not the T5 and all the training and the official code works like this. You can use sd_embed if you want to circumvent the limit.
Got it. Let me put it here.
When loading a Flux model (SD3 or PixArt) pipeline piece by piece in torch.float16
format, do it this way:
Set torch_dtype
at loading:
base_model_path = base_model_path = "black-forest-labs/FLUX.1-dev"
dtype = torch.float16
text_encoder = T5EncoderModel.from_pretrained(base_model_path, ..., torch_dtype=dtype)
Instead of casting the model to a dtype
afterwards:
base_model_path = base_model_path = "black-forest-labs/FLUX.1-dev"
dtype = torch.float16
text_encoder = T5EncoderModel.from_pretrained(base_model_path)
text_encoder.to(dtype=torch.float16)
I assume that will produce the same images, I need to validate that
Confirming that. Partial loading and direct loading produce similar results if the pipeline loaded as mentioned above
Partial Loading | Direct Loading |
---|---|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
closing this issue since the question was answered and the problem was resolved.
Describe the bug
I've observed strange behavior when loading the Flux.1-dev model. There are two ways to load the model that produce different results if run with the same seed. One of the options is from the HF diffusers doc, the second one is inspired by the ai-toolkit repo
Reproduction
First option, use
from_pretrained
onFluxPipeline
. Second, option load the pipeline piece by pieceCommon initialization:
Option 1:
Option 2:
The inference code:
Logs
No response
System Info
diffusers == 0.31.0.dev0
Who can help?
No response