Open wangherr opened 1 month ago
I solve it by:
flux_controlnet.train()
if args.num_single_layers == 0:
flux_controlnet.transformer_blocks[-1].attn.to_add_out.requires_grad_(False)
flux_controlnet.transformer_blocks[-1].ff_context.requires_grad_(False)
...
# params_to_optimize = flux_controlnet.parameters()
params_to_optimize = [param for param in flux_controlnet.parameters() if param.requires_grad]
but I am not sure if my modifications are logically correct
Cc: @PromeAIpro
I met the same problem.
I solve it by:↳
flux_controlnet.train() if args.num_single_layers == 0: flux_controlnet.transformer_blocks[-1].attn.to_add_out.requires_grad_(False) flux_controlnet.transformer_blocks[-1].ff_context.requires_grad_(False) ... # params_to_optimize = flux_controlnet.parameters() params_to_optimize = [param for param in flux_controlnet.parameters() if param.requires_grad]
but I am not sure if my modifications are logically correct↳
I wonder why you change this two modules, and If the last transformer_blocks's requires_grad is False, can the gradient be backward to the former layers? Thanks!
if args.num_single_layers == 0:
flux_controlnet.transformer_blocks[-1].attn.to_add_out.requires_grad_(False)
flux_controlnet.transformer_blocks[-1].ff_context.requires_grad_(False)
I solved it by using deepspeed, zero_stage:2
I solve it by:↳
flux_controlnet.train() if args.num_single_layers == 0: flux_controlnet.transformer_blocks[-1].attn.to_add_out.requires_grad_(False) flux_controlnet.transformer_blocks[-1].ff_context.requires_grad_(False) ... # params_to_optimize = flux_controlnet.parameters() params_to_optimize = [param for param in flux_controlnet.parameters() if param.requires_grad]
but I am not sure if my modifications are logically correct↳
I wonder why you change this two modules, and If the last transformer_blocks's requires_grad is False, can the gradient be backward to the former layers? Thanks!
if args.num_single_layers == 0: flux_controlnet.transformer_blocks[-1].attn.to_add_out.requires_grad_(False) flux_controlnet.transformer_blocks[-1].ff_context.requires_grad_(False)
In double block, there is text attn and image attn, I just remove the grad of text attn.
Hi, do you meet the similar error when training the controlnet_sd3?
Describe the bug
While training flux-controlnet on a multi-GPU server and restricting the training to a single GPU, setting _num_singlelayers=0 leads to an error:
[rank0]: Parameter indices which did not receive grad for rank 0: 64 65 72 73 74 75
Reproduction
accelerate launch --gpu_ids='0,' --num_processes=1 --num_machines=1 --main_process_port 28700 train_controlnet_flux.py \ --pretrained_model_name_or_path="black-forest-labs/FLUX.1-schnell" \ --dataset_name="lucataco/fill1k" \ --conditioning_image_column=conditioning_image \ --image_column=image \ --caption_column=text \ --output_dir="logs" \ --mixed_precision="bf16" \ --resolution=512 \ --learning_rate=1e-5 \ --max_train_steps=15000 \ --validation_steps=100 \ --checkpointing_steps=200 \ --validation_image "./example_images/conditioning_image_1.png" "./example_images/conditioning_image_2.png" \ --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ --train_batch_size=1 \ --gradient_accumulation_steps=1 \ --report_to="tensorboard" \ --num_double_layers=2 \ --num_single_layers=0 \ --seed=42 \ --enable_model_cpu_offload \ --use_8bit_adam \ --use_adafactor \ --gradient_checkpointing \
Logs
System Info
Who can help?
@sayakpaul