Closed waomodder closed 2 months ago
You need to provide the base learning_rate
, same as before.
Also your network_alpha
seems crazy high, you should use the same value as network_dim
or lower.
Plus huber_schedule
is ignored when loss_type
is not "huber" (Huber loss is currently not supported in Flux).
@recris Thanks for pointing this out. We would like to know if there is an ‘adamwschedulefree’ setting that you think is best.
I typically start with network_dim = 16
, network_alpha = 8
and learning_rate = 2e-4
then tweak the LR from there.
I also recommend training at a lower resolution first (like 640px) while experimenting with different parameters, it's much quicker while figuring the what the optimal settings are.
Thank you for your detailed guidance.
https://github.com/kohya-ss/sd-scripts/pull/1600 The ScheduleFree optimiser is used, but the initial Loss value is over 3000 and convergence is too slow. Even after an hour, I cannot even reach the decimal point (0.3) when using AdamW. Can you please tell me if there are any recommended settings for use?
command
accelerate launch --num_cpu_threads_per_process 20 flux_train_network.py --pretrained_model_name_or_path "D:\ComfyUI_windows_portable\ComfyUI\models\unet\flux1devpro2.safetensors" --train_data_dir "D:\Lora_learning\Data\asset\super_robot_diffusion_F" --output_dir "D:\Lora_learning\Data\output" --network_module "networks.lora_flux" --gradient_checkpointing --persistent_data_loader_workers --cache_latents --cache_latents_to_disk --max_data_loader_n_workers 12 --enable_bucket --save_model_as "safetensors" --lr_scheduler_num_cycles 4 --mixed_precision "bf16" --resolution 1024 --train_batch_size 1 --max_train_epochs 10 --network_dim 32 --network_alpha 256.0 --save_every_n_epochs 1 --save_every_n_steps 250 --optimizer_type "adamwschedulefree" --output_name "SRD_F_v05_t11" --ae "D:\ComfyUI_windows_portable\ComfyUI\models\vae\ae.safetensors" --bucket_no_upscale --save_precision "fp16" --min_bucket_reso 320 --max_bucket_reso 2048 --caption_extension ".txt" --seed 42 --fp8_base --highvram --loss_type "l2" --huber_schedule "snr" --gradient_accumulation_steps 2 --timestep_sampling flux_shift --model_prediction_type "raw" --guidance_scale 1 --clip_l "D:\stable-diffusion-webui\models\CLIP\clip_l.safetensors" --t5xxl "D:\stable-diffusion-webui\models\CLIP\t5xxl_fp16.safetensors" --sdpa --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --network_weights "D:\Lora_learning\Data\output\SRD_F_v05_t10-000008.safetensors"