kohya-ss / sd-scripts

Apache License 2.0
5.24k stars 867 forks source link

[help]Is generating samples during lora training any different than generating them using sdxl_gen_img.py? #1214

Closed suede299 closed 7 months ago

suede299 commented 7 months ago

The samples generated during lora training look completely normal. And when I generate images using the obtained lora file after the training is done, it looks totally wrong. (Not wrong in the sense that the content is not as expected, but wrong in the sense that it is full of black pinstripes or something like that) Wondering if the two work differently to generate images.

suede299 commented 7 months ago

The generation parameters used when using sdxl_gen_img.py are exactly the same as those used for sample. Testing it in auto1111 and comfyui both get the wrong image. Only the sample images generated during training look normal.

Mosfett1975 commented 7 months ago

Yes, just the same problem. I've generated couple LORA, samples looks great but when I tried to use it with Fooocus, couldn't get anything like it.

kohya-ss commented 7 months ago

sdxl_gen_img.py and gen_imgs.py uses same sampler and U-Net, Text Encoders as the samples during training. So I have no idea for the issue.

Could you please share the command line for training LoRA?

Mosfett1975 commented 7 months ago

sdxl_gen_img.py and gen_imgs.py uses same sampler and U-Net, Text Encoders as the samples during training. So I have no idea for the issue.

Could you please share the command line for training LoRA?

with that config I got perfect samples during training (after ~500steps) https://github.com/Mosfett1975/LORA_conf/blob/main/adafactor1.json

With than command line it were even faster, I've got not bad sample after 65 steps (1 epoch), but checked it with Fooocus I didn't find anything even remotely resembling a sample.

accelerate launch --num_cpu_threads_per_process=2 "C:\AI\kohya_dev\kohya_ss/sd-scripts/sdxl_train_network.py" --bucket_no_upscale --bucket_reso_steps=32 --cache_latents --cache_latents_to_disk --caption_extension=".txt" --enable_bucket --min_bucket_reso=64 --max_bucket_reso=2048 --full_bf16 --learning_rate="0.0003" --logging_dir="C:/AI/tmp/log" --lr_scheduler="constant" --lr_scheduler_num_cycles="80" --max_data_loader_n_workers="1" --max_grad_norm="1" --resolution="512,512" --max_train_steps="5200" --min_timestep=0 --mixed_precision="bf16" --network_alpha="16" --network_dim=32 --network_module=networks.lora --no_half_vae --optimizer_type="AdamW" --output_dir="C:/AI/tmp/model/44" --output_name="Natusik" --pretrained_model_name_or_path="C:/AI/sd_xl_base_1.0.safetensors" --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="bf16" --seed="1" --text_encoder_lr=0.0003 --train_batch_size="1" --train_data_dir="C:/AI/tmp/img" --unet_lr=0.0003 --wandb_api_key="False" --xformers --sample_sampler=euler_a --sample_prompts="C:/AI/tmp/model/44\sample\prompt.txt" --sample_every_n_epochs=1

kohya-ss commented 7 months ago

The default resolution for SDXL is 1024x1024. If you train the LoRA with 512x512, the training result is unexpected. The LoRA may produce good generations for 512x512 in Fooocus etc., or may work better with 1024x1024.

If you didn't specify --H and --W in prompt.txt, the sample generations are 512x512, so generating images with 512x512 in Fooocus may be similar.

Mosfett1975 commented 7 months ago

Well, I deleted Kohya-ss, insalled Dev version without Triton, make this config: accelerate launch --gpu_ids="0" --num_cpu_threads_per_process=8 "C:\AI\kohya_dev\kohya_ss/sd-scripts/sdxl_train_network.py" --network_train_unet_only --bucket_no_upscale --bucket_reso_steps=32 --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs --caption_extension=".txt" --clip_skip=2 --enable_bucket --min_bucket_reso=1024 --max_bucket_reso=2048 --full_bf16 --gradient_checkpointing --learning_rate="0.004" --logging_dir="C:/AI/tmp/log" --lr_scheduler="constant_with_warmup" --lr_scheduler_num_cycles="60" --max_data_loader_n_workers="1" --max_grad_norm="1" --resolution="1024,1024" --max_token_length=150 --max_train_steps="4000" --min_snr_gamma=5 --min_timestep=0 --mixed_precision="bf16" --network_alpha="1" --network_dim=256 --network_module=networks.lora --no_half_vae --multires_noise_iterations="8" --multires_noise_discount="0.35" --optimizer_args "scale_parameter=False", "relative_step=False", "warmup_init=False" --optimizer_type="Adafactor" --output_dir="C:/AI/tmp/model/3" --output_name="Natusik" --pretrained_model_name_or_path="C:/AI/sd_xl_base_1.0.safetensors" --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="bf16" --text_encoder_lr=0.004 --train_batch_size="1" --train_data_dir="C:/AI/tmp/img" --unet_lr=0.004 --wandb_api_key="False" --xformers --sample_sampler=euler_a --sample_prompts="C:/AI/tmp/model/3\sample\prompt.txt" --sample_every_n_epochs=1

                     Got not bad sample on 1690 (26'th epoch)step with seed 1 in sample prompt, try to reproduce with Foooocus 1024x1024 seed 1, made 10 images but again don't recieved that I expected. Perhaps I do something wrong and should change settings
Mosfett1975 commented 7 months ago

ohh, I reproduced it! But bad news it works only that base model I used for training - sd_xl_base_1.0 in my case, any different base models doesn't give result I expected :(

suede299 commented 7 months ago

Since I used network_module=lycoris.kohya for training. So not sure what went wrong. Have gotten good training results on the same version of sd-scripts. Even though the sample images and the final lora generated images look different. But the concepts are getting learned, it's just a difference, not an error. When I used the newer features in lycoris, I was only able to get correct sample images, and the images generated by loading lora on its own were disorganized and full of black stripes. After several tests, one wonders if this is the process for sample generation during training? Save lora Weight file => load checkpoint => load saved file => generate image. If the sample generation doesn't have the process of saving => loading the lora file, then I might be sure that the problem is on the file saving side of the new lycoris feature.

kohya-ss commented 7 months ago

ohh, I reproduced it! But bad news it works only that base model I used for training - sd_xl_base_1.0 in my case, any different base models doesn't give result I expected :(

Yes, some LoRAs do not seem to work well with the model which is not trained on.

kohya-ss commented 7 months ago

sdxl_gen_img.py nor gen_imgs.py do not support Lycoris for image generation. Auto1111's Web UI nor Comfy also may not support new features of Lycoris, so please check whether it is supported.

DKnight54 commented 7 months ago

After several tests, one wonders if this is the process for sample generation during training? Save lora Weight file => load checkpoint => load saved file => generate image. If the sample generation doesn't have the process of saving

@suede299 Having dug through the sample image generation, I am confident in saying no, that's not how the sample generation works.

The base training model + network weights are never unloaded, and is directly used while still loaded in memory, to generate the sample images. I suspect that this can lead to some differences between loading the final saved LoRA vs generating sample image while still loaded in memory, but from what I've experienced, most of the time the differences should be minute.

suede299 commented 7 months ago

After several tests, one wonders if this is the process for sample generation during training? Save lora Weight file => load checkpoint => load saved file => generate image. If the sample generation doesn't have the process of saving

@suede299 Having dug through the sample image generation, I am confident in saying no, that's not how the sample generation works.

The base training model + network weights are never unloaded, and is directly used while still loaded in memory, to generate the sample images. I suspect that this can lead to some differences between loading the final saved LoRA vs generating sample image while still loaded in memory, but from what I've experienced, most of the time the differences should be minute.

Thx. Your explanation was very helpful. Then when the images produced by the two methods are extremely different. It becomes reasonable to go ahead and suspect an error in saving the Lora file.

DKnight54 commented 7 months ago

@suede299, while I don't know your training parameters, if it's similar to @Mosfett1975's, in which he is training in bf16 and saving in bf16, I'm somoewhat tempted to wonder if that might be related to the saving issue. Especially when I consider that most available models that we use is probably in the FP16 format.

Perhaps as an experiment, try training and saving in FP16 and see if the results are still very different when loaded?

I've not experienced this issue thus far, but mainly been training in FP16

suede299 commented 7 months ago

,虽然我不知道你的训练参数,如果它类似于 's,其中他在 bf16 中训练并在 bf16 中保存,我很想知道这是否可能与保存问题有关。特别是当我考虑到我们使用的大多数可用模型可能是 FP16 格式时。

也许作为一个实验,尝试在 FP16 中训练和保存,看看加载时结果是否仍然有很大不同?

到目前为止,我还没有遇到过这个问题,但主要是在 FP16 中训练

The same version, trained with bf16 and saved as fp16, I've gotten normal results. Of course, I don't have strict control over the variables to test it. kohya-ss mentioned in another question that it might be related to dependent, recently updated bitsandbytes and using 8bit optimizer, I'll check the requirements and try again (maybe the GUI has different requirements than here).

suede299 commented 7 months ago

It's been determined that what I'm experiencing can only be reproduced when using parts of lycoris and with rank_dropout > 0, so it's not a problem with sd-scripts. So when you use dora_wd=True , you should leave rank_dropout=0.