kohya-ss / sd-scripts

Apache License 2.0
5.31k stars 880 forks source link

Recommended settings for ScheduleFree. #1631

Closed waomodder closed 2 months ago

waomodder commented 2 months ago

https://github.com/kohya-ss/sd-scripts/pull/1600 The ScheduleFree optimiser is used, but the initial Loss value is over 3000 and convergence is too slow. Even after an hour, I cannot even reach the decimal point (0.3) when using AdamW. Can you please tell me if there are any recommended settings for use?

command accelerate launch --num_cpu_threads_per_process 20 flux_train_network.py --pretrained_model_name_or_path "D:\ComfyUI_windows_portable\ComfyUI\models\unet\flux1devpro2.safetensors" --train_data_dir "D:\Lora_learning\Data\asset\super_robot_diffusion_F" --output_dir "D:\Lora_learning\Data\output" --network_module "networks.lora_flux" --gradient_checkpointing --persistent_data_loader_workers --cache_latents --cache_latents_to_disk --max_data_loader_n_workers 12 --enable_bucket --save_model_as "safetensors" --lr_scheduler_num_cycles 4 --mixed_precision "bf16" --resolution 1024 --train_batch_size 1 --max_train_epochs 10 --network_dim 32 --network_alpha 256.0 --save_every_n_epochs 1 --save_every_n_steps 250 --optimizer_type "adamwschedulefree" --output_name "SRD_F_v05_t11" --ae "D:\ComfyUI_windows_portable\ComfyUI\models\vae\ae.safetensors" --bucket_no_upscale --save_precision "fp16" --min_bucket_reso 320 --max_bucket_reso 2048 --caption_extension ".txt" --seed 42 --fp8_base --highvram --loss_type "l2" --huber_schedule "snr" --gradient_accumulation_steps 2 --timestep_sampling flux_shift --model_prediction_type "raw" --guidance_scale 1 --clip_l "D:\stable-diffusion-webui\models\CLIP\clip_l.safetensors" --t5xxl "D:\stable-diffusion-webui\models\CLIP\t5xxl_fp16.safetensors" --sdpa --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --network_weights "D:\Lora_learning\Data\output\SRD_F_v05_t10-000008.safetensors"

image

recris commented 2 months ago

You need to provide the base learning_rate, same as before.

Also your network_alpha seems crazy high, you should use the same value as network_dim or lower.

Plus huber_schedule is ignored when loss_type is not "huber" (Huber loss is currently not supported in Flux).

waomodder commented 2 months ago

@recris Thanks for pointing this out. We would like to know if there is an ‘adamwschedulefree’ setting that you think is best.

recris commented 2 months ago

I typically start with network_dim = 16, network_alpha = 8 and learning_rate = 2e-4 then tweak the LR from there.

I also recommend training at a lower resolution first (like 640px) while experimenting with different parameters, it's much quicker while figuring the what the optimal settings are.

waomodder commented 2 months ago

Thank you for your detailed guidance.