huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.24k stars 5.22k forks source link

train_dreambooth_lora_flux with prodigy and train_text_encoder causes IndexError: list index out of range #9464

Open squewel opened 3 days ago

squewel commented 3 days ago

Describe the bug

train_dreambooth_lora_flux.py when running with --train_text_encoder --optimizer="prodigy" causes IndexError: list index out of range because of this:

09/18/2024 20:06:33 - WARNING - main - Learning rates were provided both for the transformer and the text encoder- e.g. text_encoder_lr: 5e-06 and learning_rate: 1.0. When using prodigy only learning_rate is used as the initial learning rate. Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1891, in main(args) File "/content/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1375, in main params_to_optimize[2]["lr"] = args.learning_rate IndexError: list index out of range

Reproduction

run the sample script with params from the docs:

https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md

To perform DreamBooth LoRA with text-encoder training, run:

export MODEL_NAME="black-forest-labs/FLUX.1-dev"
export OUTPUT_DIR="trained-flux-dev-dreambooth-lora"

accelerate launch train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="bf16" \
  --train_text_encoder\
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --guidance_scale=1 \
  --gradient_accumulation_steps=4 \
  --optimizer="prodigy" \
  --learning_rate=1. \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --seed="0" \
  --push_to_hub

Logs

09/18/2024 20:06:33 - WARNING - __main__ - Learning rates were provided both for the transformer and the text encoder- e.g. text_encoder_lr: 5e-06 and learning_rate: 1.0. When using prodigy only learning_rate is used as the initial learning rate.
Traceback (most recent call last):
  File "/content/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1891, in <module>
    main(args)
  File "/content/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1375, in main
    params_to_optimize[2]["lr"] = args.learning_rate
IndexError: list index out of range

System Info

colab

Who can help?

@sayakpaul

sayakpaul commented 3 days ago

Cc: @linoytsaban

biswaroop1547 commented 2 days ago

I've added a fix, given that we don't tune the T5 text encoder in flux, so that line is unnecessary (possibly a typo). Let me know if that's not the case, thanks!

linoytsaban commented 1 day ago

Thanks @biswaroop1547! indeed for now we support full fine-tuning of the CLIP encoder only when --train_text_encoder is enabled