huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.02k stars 5.35k forks source link

Flux training seems not to update the transformer model #9861

Open weixiong-ur opened 11 hours ago

weixiong-ur commented 11 hours ago

Describe the bug

When I loaded the checkpoint of the transformer saved using the training script train_dreambooth_flux.py, I found it exactly the same as the pretrained flux-dev model. So I suspect that the model is not updating the parameters. Meanwhile, I notice that the optimizer.bin in the checkpoint save dir is very small, only 1.3K. This could be abnormal. The saved checkpoint works using the training script train_dreambooth_sd3.py. However, it fails with train_dreambooth_flux.py.

Reproduction

A testing script is like this:

import torch
from diffusers import FluxPipeline
from accelerate import Accelerator
import diffusers
from diffusers import (
    AutoencoderKL,
    FlowMatchEulerDiscreteScheduler,
    FluxPipeline,
    FluxTransformer2DModel,
)

transformer1 = FluxTransformer2DModel.from_pretrained(
        "black-forest-labs/FLUX.1-dev", subfolder="transformer", torch_dtype=torch.bfloat16
    )
transformer1.eval()
initial_params = {name: param.data.clone() for name, param in transformer1.named_parameters()}
# the folder that saves the checkpoint of the transformer using accelerator.save_state()
transformer_path = '/xxx/checkpoint-2/transformer'
transformer2 = FluxTransformer2DModel.from_pretrained(
        transformer_path, torch_dtype=torch.bfloat16, 
    )
for name, param in transformer2.named_parameters():
    if not torch.equal(initial_params[name], param.data):
        print(name, ' not match')

Logs

Using the test script above, we can find that the saved transformer is exactly the same as the pretrained transformer.

System Info

Who can help?

@sayakpaul

sayakpaul commented 10 hours ago

Cc: @linoytsaban