Open TPreece101 opened 9 months ago
Have you tried to set max_grad_norm = 0.0 ? This is command i've used and it worked, just change training script, paths and filenames: accelerate launch --num_cpu_threads_per_process=4 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="/workspace/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors" --train_data_dir="/workspace/dataset/img" --resolution="1024,1024" --output_dir="/workspace/dataset/model" --logging_dir="workspace/dataset/log" --save_model_as=safetensors --network_module=lycoris.kohya --network_args "preset=attn-mlp" "algo=full" "train_norm=True" "rank_dropout=0" "module_dropout=0" "use_tucker=True" "use_scalar=False" "rank_dropout_scale=False" --network_dropout="0" --text_encoder_lr=1.0 --unet_lr=1.0 --output_name="Mstult" --lr_scheduler_num_cycles="100" --no_half_vae --full_bf16 --learning_rate="1.0" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="8700" --save_every_n_epochs="10" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --cache_latents_to_disk --optimizer_type="Prodigy" --optimizer_args decouple=True weight_decay=0.01 d_coef=2 use_bias_correction=True safeguard_warmup=False betas=0.9,0.999 --max_grad_norm="0" --max_data_loader_n_workers="1" --keep_tokens="1" --vae_batch_size="2" --bucket_reso_steps=32 --min_snr_gamma=5 --shuffle_caption --gradient_checkpointing --persistent_data_loader_workers --noise_offset=0.0357 --vae="/workspace/stable-diffusion-webui/models/VAE/sdxl_vae.safetensors" --sample_sampler=euler_a --sample_prompts="/workspace/dataset/model/sample/prompt.txt" --sample_every_n_steps="870"
Make sure you have enough RAM memory, cause i was getting error with saving checkpoint (RTX 3090). I had to use RTX A6000 on Runpod to make it working
Hi, I'm having trouble training a LyCORIS using the "full" algorithm. This is the error I get:
I'm wondering if there is anything obviously wrong in my config file (below). It works fine with other algorithms such as
lora
andlokr
which is strange.Extra Info
Let me know if you need any extra info, I'm quite new to training LoRAs / LyCORIS so I might be doing something silly