huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.99k stars 5.35k forks source link

Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor #9350

Closed arrowonstr closed 2 weeks ago

arrowonstr commented 2 months ago

Describe the bug

Passing txt_ids 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor

Is the bath_size shouldn't be 1?

Reproduction

accelerate launch train_dreambooth_lora_flux.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$DATASET_NAME \ --output_dir=$OUTPUT_DIR \ --mixed_precision="fp16" \ --instance_prompt="a photo of wac linear pendant,black and white,white background" \ --caption_column="prompt" \ --resolution=512 \ --train_batch_size=1 \ --gradient_accumulation_steps=2 \ --learning_rate=5e-5 \ --report_to="wandb" \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=2000 \ --repeats=2 \ --validation_prompt="a photo of wac linear pendant,black and white,white background" \ --validation_epochs=50 \ --num_validation_images=1 \ --rank=16 \ --checkpointing_steps=200 \ --seed="0"

Logs

09/03/2024 10:45:35 - INFO - __main__ - ***** Running training *****
09/03/2024 10:45:35 - INFO - __main__ -   Num examples = 26
09/03/2024 10:45:35 - INFO - __main__ -   Num batches each epoch = 26
09/03/2024 10:45:35 - INFO - __main__ -   Num Epochs = 154
09/03/2024 10:45:35 - INFO - __main__ -   Instantaneous batch size per device = 1
09/03/2024 10:45:35 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 2
09/03/2024 10:45:35 - INFO - __main__ -   Gradient Accumulation steps = 2
09/03/2024 10:45:35 - INFO - __main__ -   Total optimization steps = 2000
Steps:   0%|                                                                                                                                                                            | 0/2000 [00:00<?, ?it/s]
Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|                                                                                                                                                       | 0/2000 [00:01<?, ?it/s, loss=0.522, lr=5e-5]
Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor

System Info

Who can help?

No response

yiyixuxu commented 1 month ago

I think you will modify here https://github.com/huggingface/diffusers/blob/8ecf499d8bda3721ce89f5cb8c804afec4966b6a/examples/dreambooth/train_dreambooth_lora_flux.py#L994

to be same as

https://github.com/huggingface/diffusers/blob/8ecf499d8bda3721ce89f5cb8c804afec4966b6a/src/diffusers/pipelines/flux/pipeline_flux.py#L363

yiyixuxu commented 1 month ago

do you want to open a PR to help us?

LifuWang-66 commented 1 month ago

Should also change the next line or remove it. https://github.com/huggingface/diffusers/blob/8ecf499d8bda3721ce89f5cb8c804afec4966b6a/examples/dreambooth/train_dreambooth_lora_flux.py#L995

github-actions[bot] commented 4 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

0x-74 commented 3 weeks ago

hey would love to try this!

0x-74 commented 3 weeks ago

okay so all the tests for lora pass using pytest but the thing is i am on a m2 macbook with 8gb ram hence unable to run the script

accelerate launch examples/dreambooth/train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path=black-forest-labs/FLUX.1-dev \
  --dataset_name="google/dreambooth" \
  --output_dir="./big" \
  --mixed_precision="no" \
  --instance_prompt="a photo of wac linear pendant, black and white, white background" \
  --caption_column="prompt" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=2 \
  --learning_rate=5e-5 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=2000 \
  --repeats=2 \
  --validation_prompt="a photo of wac linear pendant, black and white, white background" \
  --validation_epochs=50 \
  --num_validation_images=1 \
  --rank=16 \
  --checkpointing_steps=200 \
  --seed="0" \
  --gradient_accumulation=2 \
  --gradient_checkpointing \
  --use_8bit_adam \
  --cache_latents

i know this is a physical limitation and the fact that i cant use any other optimizations like xformers, is there any other alternative?

yiyixuxu commented 3 weeks ago

@0x-74 thanks! please open a PR!