FLUX train controlnet failed: embedding tensor size not match

yapengyu commented 3 weeks ago

Describe the bug

trying to train flux controlnet, reference to 'train_controlnet_flux.py' and 'readme_flux.txt'

Reproduction

use the dataset 'fusing/fill50k', and the parameters mentioned in 'readme_flux.txt'

Logs

Traceback (most recent call last):
  File "/path_to_conda/projects/face-workspace-ai-vision-face/diffusion/15.HandPose/train_controlnet/train_controlnet_flux.py", line 1452, in <module>
    main(args)
  File "/path_to_conda/projects/face-workspace-ai-vision-face/diffusion/15.HandPose/train_controlnet/train_controlnet_flux.py", line 1310, in main
    controlnet_block_samples, controlnet_single_block_samples = flux_controlnet(
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/accelerate/utils/operations.py", line 808, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/diffusers/models/controlnet_flux.py", line 354, in forward
    encoder_hidden_states, hidden_states = block(
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/diffusers/models/transformers/transformer_flux.py", line 175, in forward
    attn_output, context_attn_output = self.attn(
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 495, in forward
    return self.processor(
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 1778, in __call__
    query = apply_rotary_emb(query, image_rotary_emb)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/diffusers/models/embeddings.py", line 734, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
RuntimeError: The size of tensor a (1536) must match the size of tensor b (768) at non-singleton dimension 2
Steps:   0%|                                                                                | 0/15000 [00:05<?, ?it/s]
Traceback (most recent call last):
  File "/path_to_conda/anaconda3/envs/diffusers-newest/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "/path_to_conda/anaconda3/envs/diffusers-newest/lib/python3.9/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

System Info

diffusers 0.31.0
accelerate 1.0.1
transformers 4.46.1
torch 2.1.2+cu118
torchvision 0.16.2+cu118

Who can help?

@ScilenceForest @sayakpaul

ScilenceForest commented 3 weeks ago

The changes I made were for periodic validation during training, your error seems to be when training has not yet started, perhaps @PromeAIpro the original author is needed.Also in my experience you need to double check that the paths, types etc. of the various models match.

yapengyu commented 3 weeks ago

@PromeAIpro Hi，would you like check the bug? thx

PromeAIpro commented 3 weeks ago

Can you show the detailed parameters you used?

yapengyu commented 3 weeks ago

@PromeAIpro the parameters as following:

accelerate launch train_controlnet_flux.py \
    --pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
    --dataset_name=fusing/fill50k \
    --conditioning_image_column=conditioning_image \
    --image_column=image \
    --caption_column=text \
    --output_dir=$OUTPUT_DIR \
    --mixed_precision="fp16" \
    --resolution=512 \
    --learning_rate=1e-5 \
    --max_train_steps=15000 \
    --validation_steps=100 \
    --checkpointing_steps=200 \
    --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
    --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
    --train_batch_size=1 \
    --gradient_accumulation_steps=4 \
    --num_double_layers=4 \
    --num_single_layers=0 \
    --seed=42 \
    --cache_dir=$CACHE_DIR \
    --max_train_samples=100

PromeAIpro commented 3 weeks ago

will take some time discussing what is going on this pr https://github.com/huggingface/diffusers/pull/9711 guess this cause your issue


                latent_image_ids = FluxControlNetPipeline._prepare_latent_image_ids(
                    batch_size=pixel_latents_tmp.shape[0],
                    height=pixel_latents_tmp.shape[2] // 2,
                    width=pixel_latents_tmp.shape[3] // 2,
                    device=pixel_values.device,
                    dtype=pixel_values.dtype,
                )

wuutiing commented 3 weeks ago

@PromeAIpro Hi，would you like check the bug? thx

checkout the train script to 0.31.0 may work and be quickest

yapengyu commented 3 weeks ago

will take some time discussing what is going on this pr #9711 guess this cause your issue


                latent_image_ids = FluxControlNetPipeline._prepare_latent_image_ids(
                    batch_size=pixel_latents_tmp.shape[0],
                    height=pixel_latents_tmp.shape[2] // 2,
                    width=pixel_latents_tmp.shape[3] // 2,
                    device=pixel_values.device,
                    dtype=pixel_values.dtype,
                )

oh, thx bro, commented "// 2" it works! thx greatly!

yapengyu commented 3 weeks ago

@PromeAIpro Hi，would you like check the bug? thx

checkout the train script to 0.31.0 may work and be quickest

oh, u'r right! thx~ In branch tag v0.31.0, the bug is fixed, while existing in the main branch

PromeAIpro commented 3 weeks ago

seems it is under developing, and training scripts fits with latest dev-branch, checkout to 0.31.0 (same with release) also make sense. See https://github.com/huggingface/diffusers/pull/9711

huggingface / diffusers