kohya-ss / sd-scripts

Apache License 2.0
5.13k stars 854 forks source link

Problem with "Gradient checkpointing" #1183

Closed ivanced09 closed 6 months ago

ivanced09 commented 7 months ago

People, I have the following problem when I use the mentioned option both in training with 1.5 and with XL

image

Additionally, I have been seeing the following two warnings for a couple of updates, I don't see that they affect me but I would be interested to know if there are any settings that I should modify

WARNING because max_grad_norm is set, clip_grad_norm is enabled. consider set to 0 / train_util.py:3807 max_grad_normが設定されているためclip_grad_normが有効になります。0に設定して無効にしたほうがいいかもしれません WARNING constant_with_warmup will be good / スケジューラはconstant_with_warmupが良いかもしれません train_util.py:3811

My training setup is as follows:

accelerate launch --gpu_ids="0" --num_cpu_threads_per_process=2 "E:\kohya_ss/sd-scripts/sdxl_train_network.py" --bucket_no_upscale --bucket_reso_steps=64 --cache_latents --cache_latents_to_disk --caption_extension=".txt" --flip_aug --gradient_checkpointing --learning_rate="0.0001" --lr_scheduler="constant" --lr_scheduler_num_cycles="1" --max_data_loader_n_workers="0" --max_grad_norm="1" --resolution="1024,1024" --max_train_steps="1920" --mixed_precision="bf16" --network_alpha="8" --network_dim=16 --network_module=networks.lora --no_half_vae --optimizer_args scale_parameter=False relative_step=False warmup_init=False --optimizer_type="Adafactor" --output_dir="D:\Lora work\Metaverso\Elizabeth9\model" --output_name="cara delevingne woman-XL01A" --pretrained_model_name_or_path="E:/webui_forge_cu121_torch21/webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors" --reg_data_dir="D:\Lora work\Metaverso\Elizabeth9\reg" --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="bf16" --save_state --scale_weight_norms="1" --text_encoder_lr=0.0001 --train_batch_size="1" --train_data_dir="D:\Lora work\Metaverso\Elizabeth9\img" --unet_lr=0.0001 --xformers

GPU: RTX 3060 12gb

kohya-ss commented 7 months ago

The warning about use_reentrant seems to warn the specification changing in future, so there is no issue for now. I will update the script before the spec is updated.

AdaFactor optimizer assumes that max_grad_norm is set to 0 to disable clip_grad_norm, so the warning is shown. Also constant_with_warmup scheduler is assumed by AdaFactor optimizer.

Therefore, please specify options like --max_grad_norm=0 --lr_scheduler="constant_with_warmup" --lr_warmup_steps=100.

ivanced09 commented 6 months ago

The warning about use_reentrant seems to warn the specification changing in future, so there is no issue for now. I will update the script before the spec is updated.

AdaFactor optimizer assumes that max_grad_norm is set to 0 to disable clip_grad_norm, so the warning is shown. Also constant_with_warmup scheduler is assumed by AdaFactor optimizer.

Therefore, please specify options like --max_grad_norm=0 --lr_scheduler="constant_with_warmup" --lr_warmup_steps=100.

Thank you very much for the clarification