Closed drimeF0 closed 1 year ago
You don't want to train SDXL with 256x1024 and 512x512 images; those are too small. You should use 1024x1024 resolution for 1:1 aspect ratio and 512x2048 for 1:4 aspect ratio. Did you disable upscaling bucket resolutions?
You don't want to train SDXL with 256x1024 and 512x512 images; those are too small. You should use 1024x1024 resolution for 1:1 aspect ratio and 512x2048 for 1:4 aspect ratio. Did you disable upscaling bucket resolutions?
I fixed the bucket resolution.
%cd /content/kohya_ss/finetune !python3 prepare_buckets_latents.py --bucket_reso_steps 64 --min_bucket_reso 1024 --max_bucket_reso 1024 --max_resolution 1024,1024 /content/dataset /content/dataset.json /content/dataset_lat.json "femboysLover/blue_pencil-fp16-XL"
%cd /content/kohya_ss !python3 "./sdxl_train_network.py" --in_json "/content/dataset_lat.json" --network_weights "/content/out/last-step00000050.safetensors" --pretrained_model_name_or_path="femboysLover/blue_pencil-fp16-XL" --train_data_dir="/content/dataset" --resolution="1024,1024" --output_dir="/content/out" --network_alpha="16" --network_dim "32" --save_model_as=safetensors --network_module=networks.lora --output_name="last" --no_half_vae --learning_rate="5e-5" --lowram --lr_scheduler="constant" --train_batch_size="4" --max_train_steps="5000" --save_every_n_steps="50" --mixed_precision="fp16" --save_precision="fp16" --seed="12345" --optimizer_type="AdamW8bit" --min_snr_gamma=5 --mem_eff_attn --gradient_checkpointing --full_fp16 --xformers --sample_sampler=euler_a --sample_prompts="/content/prompt.txt" --sample_every_n_steps="50" --network_train_unet_only --cache_text_encoder_outputs
The result is still terrible, with a high LR it turns out to be some kind of mess, and with a low LR the model simply slowly degrades and the style changes. I tried it on different datasets, currently as a test I’m using 48 images from safebooru for the one_closed_eye tag.
Here is an example image at 600 training steps
%cd /content/kohya_ss
!python3 "./sdxl_train_network.py" --in_json "/content/dataset_lat.json" --pretrained_model_name_or_path="femboysLover/blue_pencil-fp16-XL" --train_data_dir="/content/dataset" --resolution="1024,1024" --output_dir="/content/out" --network_alpha="32" --network_dim "64" --save_model_as=safetensors --network_module=networks.lora --output_name="last" --no_half_vae --learning_rate="1.0" --scale_weight_norms=1 --lr_scheduler="adafactor" --lr_scheduler_num_cycles="1" --train_batch_size="4" --max_train_steps="5000" --save_every_n_steps="50" --mixed_precision="fp16" --save_precision="fp16" --seed="12345" --optimizer_type="adafactor" --mem_eff_attn --gradient_checkpointing --full_fp16 --xformers --sample_sampler=euler_a --sample_prompts="/content/prompt.txt" --sample_every_n_steps="50" --network_train_unet_only --cache_text_encoder_outputs --lowram
Can you provide an example of what it looks like when you train at .0001 learning rate?
.0001
50 steps
200 steps
%cd /content/kohya_ss
!python3 "./sdxl_train_network.py" --in_json "/content/dataset_lat.json" --pretrained_model_name_or_path="femboysLover/blue_pencil-fp16-XL" --train_data_dir="/content/dataset" --resolution="1024,1024" --output_dir="/content/out" --network_alpha="32" --network_dim "64" --save_model_as=safetensors --network_module=networks.lora --output_name="last" --no_half_vae --learning_rate="0.0001" --scale_weight_norms=1 --lr_scheduler="adafactor" --lr_scheduler_num_cycles="1" --train_batch_size="4" --max_train_steps="5000" --save_every_n_steps="50" --mixed_precision="fp16" --save_precision="fp16" --seed="12345" --optimizer_type="adafactor" --mem_eff_attn --gradient_checkpointing --full_fp16 --xformers --sample_sampler=euler_a --sample_prompts="/content/prompt.txt" --sample_every_n_steps="50" --network_train_unet_only --cache_text_encoder_outputs --lowram
I've never used the full_fp16 or lowram settings before, so I don't know if those could be negatively affecting your results. Are you using the "#_trigger class" naming convention in your image inputs folder?
I've never used the full_fp16 or lowram settings before, so I don't know if those could be negatively affecting your results. Are you using the "#_trigger class" naming convention in your image inputs folder?
No, I don't use it, as this is so much I know for dreambooth training, and I train LoRA based on tags with safebooru
Could you please test with the basic settings, such as AdamW optimizer, constant scheduler, network_alpha=1, learning rate=1e-4?
Could you please test with the basic settings, such as AdamW optimizer, constant scheduler, network_alpha=1, learning rate=1e-4?
NaN loss and black images
%cd /content/kohya_ss
!python3 "./sdxl_train_network.py" --in_json "/content/dataset_lat.json" --pretrained_model_name_or_path="femboysLover/blue_pencil-fp16-XL" --train_data_dir="/content/dataset" --resolution="1024,1024" --output_dir="/content/out" --network_alpha="1" --network_dim "64" --save_model_as=safetensors --network_module=networks.lora --output_name="last" --no_half_vae --learning_rate="1e-4" --lr_scheduler="constant" --train_batch_size="4" --max_train_steps="5000" --save_every_n_steps="50" --mixed_precision="fp16" --save_precision="fp16" --seed="12345" --optimizer_type="adamw" --mem_eff_attn --gradient_checkpointing --full_fp16 --xformers --sample_sampler=euler_a --sample_prompts="/content/prompt.txt" --sample_every_n_steps="50" --network_train_unet_only --cache_text_encoder_outputs --lowram
NaN loss and black images
fp16 training (mixed_precision
, save_precision
and full_fp16
) seemed to cause the NaN issue. Please use bf16
instead of fp16
. accerarate config
is also needed.
bf16
It seems google colab with T4 does not support bf16
what about
I tested with different values and it didn't work--scale_weight_norms=1
? I've heard that this can solve the problem with NaN loss
I'm guessing you are using this implementation in Colab.
Couple of things I'd like to check
I've been getting decent results on my first try and here are my bucketing and latent caching output (Ignore the total numbers, I'm obsessed and went overboard)
Found 379 images. Creating a new metadata file Merging tags and captions into metadata json. 100% 379/379 [00:24<00:00, 15.23it/s] No captions found for any of the 379 images All 379 images have tags Cleaning captions and tags. 100% 379/379 [00:00<00:00, 3465.12it/s] Writing metadata: /content/LoRA/meta_clean.json Done! found 379 images. loading existing metadata: /content/LoRA/meta_clean.json load VAE: /content/vae/sdxl_vae.safetensors 100% 379/379 [00:15<00:00, 24.25it/s] bucket 0 (448, 1024): 20 bucket 1 (512, 1024): 2 bucket 2 (576, 1024): 39 bucket 3 (640, 1024): 1 bucket 4 (704, 1024): 51 bucket 5 (768, 1024): 92 bucket 6 (832, 1024): 1 bucket 7 (1024, 448): 1 bucket 8 (1024, 576): 2 bucket 9 (1024, 704): 7 bucket 10 (1024, 768): 21 bucket 11 (1024, 1024): 142 mean ar error: 0.01277014918625594 writing metadata: /content/LoRA/meta_lat.json done!
And Training Config output
[sdxl_arguments] cache_text_encoder_outputs = false no_half_vae = true min_timestep = 0 max_timestep = 1000 shuffle_caption = true lowram = true
[model_arguments] pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0" vae = "/content/vae/sdxl_vae.safetensors"
[dataset_arguments] debug_dataset = false in_json = "/content/LoRA/meta_lat.json" train_data_dir = "/content/drive/MyDrive/LoRA/train_2" dataset_repeats = 5 keep_tokens = 0 resolution = "1024,1024" color_aug = false token_warmup_min = 1 token_warmup_step = 0
[training_arguments] output_dir = "/content/drive/MyDrive/kohya-trainer/output/Blue_Waifu" output_name = "Blue_Waifu" save_precision = "fp16" save_every_n_epochs = 1 train_batch_size = 5 max_token_length = 225 mem_eff_attn = false sdpa = true xformers = false max_train_epochs = 3 max_data_loader_n_workers = 8 persistent_data_loader_workers = true gradient_checkpointing = true gradient_accumulation_steps = 1 mixed_precision = "fp16"
[logging_arguments] log_with = "tensorboard" logging_dir = "/content/LoRA/logs" log_prefix = "Blue_Waifu"
[sample_prompt_arguments] sample_every_n_epochs = 1 sample_sampler = "euler_a"
[saving_arguments] save_model_as = "safetensors"
[optimizer_arguments] optimizer_type = "AdaFactor" learning_rate = 0.0001 max_grad_norm = 0 optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False",] lr_scheduler = "constant_with_warmup" lr_warmup_steps = 100
[additional_network_arguments] no_metadata = false network_module = "networks.lora" network_dim = 32 network_alpha = 16 network_args = [] network_train_unet_only = true
[advanced_training_config] save_state = false save_last_n_epochs_state = false multires_noise_iterations = 6 multires_noise_discount = 0.3 caption_dropout_rate = 0 caption_tag_dropout_rate = 0.1 caption_dropout_every_n_epochs = 0 min_snr_gamma = 5
[prompt] negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, " width = 1024 height = 1024 scale = 12 sample_steps = 28 [[prompt.subset]] prompt = "masterpiece, best quality, face focus, cute, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck"
I'm guessing you are using this implementation in Colab.
Couple of things I'd like to check
1. What is the original resolution of the images you are using? 2. You seem to have selected the "cache_text_encoder_outputs", have you tried training without that? 3. You mentioned Dreambooth training, but AFAIK, that needs regularizartion images, which I haven't been able to get working yet. (If you did, TEACH ME SEMPAI) 4. Can you paste the output from the "Bucketing and Latents Caching" step? 5. Have you tried setting a VAE? 6. You mentioned testing a few datasets, have you tried on different models? 7. I think you are training on the model you'd like to use. Prehaps try training on base SDXL model then apply the Lora on that model? 8. It seems like you are trying to train the "one eye closed" concept. If that's the case, I'm under the impression that for best results, you'd need a larger than 50 image sample size, and to carefully tag everything about the images. (I like to refer to [this resource](https://rentry.org/59xed3) as a guide)
I've been getting decent results on my first try and here are my bucketing and latent caching output (Ignore the total numbers, I'm obsessed and went overboard)
Found 379 images. Creating a new metadata file Merging tags and captions into metadata json. 100% 379/379 [00:24<00:00, 15.23it/s] No captions found for any of the 379 images All 379 images have tags Cleaning captions and tags. 100% 379/379 [00:00<00:00, 3465.12it/s] Writing metadata: /content/LoRA/meta_clean.json Done! found 379 images. loading existing metadata: /content/LoRA/meta_clean.json load VAE: /content/vae/sdxl_vae.safetensors 100% 379/379 [00:15<00:00, 24.25it/s] bucket 0 (448, 1024): 20 bucket 1 (512, 1024): 2 bucket 2 (576, 1024): 39 bucket 3 (640, 1024): 1 bucket 4 (704, 1024): 51 bucket 5 (768, 1024): 92 bucket 6 (832, 1024): 1 bucket 7 (1024, 448): 1 bucket 8 (1024, 576): 2 bucket 9 (1024, 704): 7 bucket 10 (1024, 768): 21 bucket 11 (1024, 1024): 142 mean ar error: 0.01277014918625594 writing metadata: /content/LoRA/meta_lat.json done!
And Training Config output
[sdxl_arguments] cache_text_encoder_outputs = false no_half_vae = true min_timestep = 0 max_timestep = 1000 shuffle_caption = true lowram = true
[model_arguments] pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0" vae = "/content/vae/sdxl_vae.safetensors"
[dataset_arguments] debug_dataset = false in_json = "/content/LoRA/meta_lat.json" train_data_dir = "/content/drive/MyDrive/LoRA/train_2" dataset_repeats = 5 keep_tokens = 0 resolution = "1024,1024" color_aug = false token_warmup_min = 1 token_warmup_step = 0
[training_arguments] output_dir = "/content/drive/MyDrive/kohya-trainer/output/Blue_Waifu" output_name = "Blue_Waifu" save_precision = "fp16" save_every_n_epochs = 1 train_batch_size = 5 max_token_length = 225 mem_eff_attn = false sdpa = true xformers = false max_train_epochs = 3 max_data_loader_n_workers = 8 persistent_data_loader_workers = true gradient_checkpointing = true gradient_accumulation_steps = 1 mixed_precision = "fp16"
[logging_arguments] log_with = "tensorboard" logging_dir = "/content/LoRA/logs" log_prefix = "Blue_Waifu"
[sample_prompt_arguments] sample_every_n_epochs = 1 sample_sampler = "euler_a"
[saving_arguments] save_model_as = "safetensors"
[optimizer_arguments] optimizer_type = "AdaFactor" learning_rate = 0.0001 max_grad_norm = 0 optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False",] lr_scheduler = "constant_with_warmup" lr_warmup_steps = 100
[additional_network_arguments] no_metadata = false network_module = "networks.lora" network_dim = 32 network_alpha = 16 network_args = [] network_train_unet_only = true
[advanced_training_config] save_state = false save_last_n_epochs_state = false multires_noise_iterations = 6 multires_noise_discount = 0.3 caption_dropout_rate = 0 caption_tag_dropout_rate = 0.1 caption_dropout_every_n_epochs = 0 min_snr_gamma = 5
[prompt] negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, " width = 1024 height = 1024 scale = 12 sample_steps = 28 [[prompt.subset]] prompt = "masterpiece, best quality, face focus, cute, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck"
1. here are the image resolutions from the dataset and their number:
(2002, 3508): 1,
(1100, 1100): 1,
(2160, 3840): 1,
(2564, 3624): 1,
(1278, 1278): 1,
(512, 512): 1,
(1270, 945): 1,
(992, 1403): 1,
(1254, 1771): 2,
(2432, 3200): 1,
(1448, 2048): 1,
(1920, 2560): 1,
(5787, 5785): 1,
(1000, 1432): 1,
(652, 990): 1,
(1024, 1024): 2,
(1542, 2041): 1,
(2300, 3800): 1,
(1329, 1873): 1,
(888, 1013): 1,
(844, 1341): 1,
(690, 930): 1,
(775, 1100): 1,
(900, 1200): 1,
(2353, 4093): 1,
(826, 1200): 1,
(562, 800): 1,
(700, 1050): 1,
(2083, 3542): 1,
(1250, 1761): 1,
(800, 1225): 1,
(2591, 3624): 1,
(1200, 1389): 1,
(1576, 3745): 1,
(1600, 2400): 1,
(581, 841): 1,
(1507, 1653): 1,
(1530, 2047): 1,
(1447, 2047): 1
/content/kohya_ss/finetune
2023-09-21 07:51:49.881718: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
found 41 images.
loading existing metadata: /content/dataset.json
load VAE: femboysLover/blue_pencil-fp16-XL
exception occurs in loading vae: femboysLover/blue_pencil-fp16-XL does not appear to have a file named config.json.
retry with subfolder='vae'
Downloading (…)main/vae/config.json: 100% 602/602 [00:00<00:00, 3.40MB/s]
Downloading (…)ch_model.safetensors: 100% 167M/167M [00:00<00:00, 201MB/s]
The config attributes {'force_upcast': True} were passed to AutoencoderKL, but are not expected and will be ignored. Please verify your config.json configuration file.
83% 34/41 [00:37<00:07, 1.10s/it]/usr/local/lib/python3.10/dist-packages/PIL/Image.py:996: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
warnings.warn(
100% 41/41 [00:43<00:00, 1.07s/it]
bucket 0 (768, 1024): 31
bucket 1 (896, 1024): 2
bucket 2 (960, 1024): 1
bucket 3 (1024, 768): 1
bucket 4 (1024, 1024): 6
mean ar error: 0.05769353115186938
writing metadata: /content/dataset_lat.json
done!
I am using same settings as in my this video - all 1024x1024 very easy training with 13 pictures of myself
repeating 40 trained up to 8 epochs all epochs overtrained
the results are super super overtrained nothing like
something seriously broken with SDXL LoRA
attached the trained config as txt best_settings_32_rank_lora.txt
doing more tests to figure out the issue. dreambooth training on the other hand working amazing
here a tweet where I compared results : https://twitter.com/GozukaraFurkan/status/1704625590437814424
Become A Master Of SDXL Training With Kohya SS LoRAs - Combine Power Of Automatic1111 & SDXL LoRAs
something is so much broken with lora training
5e-5 speed
https://twitter.com/GozukaraFurkan/status/1704867030761984091
i am gonna test my tutorial commit and let you know
@drimeF0, One thing I'm noticing in your bucket is that the images are heavily skewed towards 768 x 1024 with 31 images and only 6 for the 1024 x 1024, and the samples you are generating are 1024 x 1024. This may be resulting in the 1024 x 1024 images being undertrained with insufficient samples.
Can you try generating images with 768 x 1024 dimensions? I suspect the results there may be more decent. One thing I've been doing in a personal copy of the colab, has been modifying the code so that the max_bucket_reso is 1536 and min_bucket_reso is 640 to try to get bucket distribution that's more inline with the recommended for SDXL. I'm reluctant to share my version for now because it's a hacked together version that does dreambooth with regularization and there are parts that are still rough and inelegant, mainly because I want it to work on the free tier. (TLDR, in order to get things to run without overruning the usage time restrictions, bucketing and latent processing is done on manual folder by folder step)
One thing I did for my dataset was cropping and resizing majority of my images into the recommended SDXL training image sizes, cribbed from this tutorial by our good friend @FurkanGozukara.
Since in the training process, images will only be trained against images in the same bucket, one thing I did was crop and rescaled the same image into mulitple sizes corresponding to the respective aspect ratios. You will note that I got lazy towards the end and skip that for several images, resulting in buckets with 1 image in them.
If the images generated at 768 x 1024 are acceptable (You might have to retrain the LoRA at the proper standing learning rate of 1e-4 in order to avoid getting overcooked results), then a quick and dirty way to get the results you want may be to crop and resize most of your images to 1024 x 1024 and use that as your dataset
Oh, one more thing. I'm not exactly sure, but it seems like the model you are using for base training does not have a built in VAE. Not sure if that will affect the quality of the latents
@FurkanGozukara Maybe something for you to test, but my experience with training with 379 images with 4 to 5 repeats actually got me pretty accurate results (and maybe slightly overtrained as the images all becase photorealistic unless I reduce the weight of the word "photo") within 3 to 5 epochs
I found my error
I were using SDXL 0.9 beta release which have different weights and FP32 :)
ignore my message
@drimeF0, One thing I'm noticing in your bucket is that the images are heavily skewed towards 768 x 1024 with 31 images and only 6 for the 1024 x 1024, and the samples you are generating are 1024 x 1024. This may be resulting in the 1024 x 1024 images being undertrained with insufficient samples.
Can you try generating images with 768 x 1024 dimensions? I suspect the results there may be more decent. One thing I've been doing in a personal copy of the colab, has been modifying the code so that the max_bucket_reso is 1536 and min_bucket_reso is 640 to try to get bucket distribution that's more inline with the recommended for SDXL. I'm reluctant to share my version for now because it's a hacked together version that does dreambooth with regularization and there are parts that are still rough and inelegant, mainly because I want it to work on the free tier. (TLDR, in order to get things to run without overruning the usage time restrictions, bucketing and latent processing is done on manual folder by folder step)
One thing I did for my dataset was cropping and resizing majority of my images into the recommended SDXL training image sizes, cribbed from this tutorial by our good friend @FurkanGozukara.
Since in the training process, images will only be trained against images in the same bucket, one thing I did was crop and rescaled the same image into mulitple sizes corresponding to the respective aspect ratios. You will note that I got lazy towards the end and skip that for several images, resulting in buckets with 1 image in them.
If the images generated at 768 x 1024 are acceptable (You might have to retrain the LoRA at the proper standing learning rate of 1e-4 in order to avoid getting overcooked results), then a quick and dirty way to get the results you want may be to crop and resize most of your images to 1024 x 1024 and use that as your dataset
I already tried the bucket size 1024x1024, the results are the same
@drimeF0, I have to admit I'm stumped. I set out to try recreating your issue in this colab implementation following the majority of the default settings in it, and was able to get results like this: prompt = "masterpiece, best quality, one eye closed, solo, 1girl"
I would note that I did modify the bucket to have min_bucket_reso of 640 and max_bucket_reso of 1536, but aside from that, the only difference I can note are the following:
For further reference, this result was training with scraping 98 images from safebooru with the following tags: "one_eye_closed, 1girl, absurdres", tags scraped from safebooru, and the clean_caption option used at step 3.4. Bucketing and Latents Caching.
The results were from the 4th epoch, with the training image repeated 10 times each epoch at batch size 5.
I think... in order for me to try troubleshooting more, I probably need to look at how you are implementing the training in your enviroment. And even then, I may not be able to help if it involves tweaking Kohya's code.
(Edited to include sample prompt)
Side note, I have similar issues where the LoRA keeps outputing both eyes closed. I believe that in order to fix this issue, we would need to expand the training data set to include "eyes_closed" images where both eyes are closed, and images where both eyes are open for the LoRA to learn the difference.
Side note, I have similar issues where the LoRA keeps outputing both eyes closed. I believe that in order to fix this issue, we would need to expand the training data set to include "eyes_closed" images where both eyes are closed, and images where both eyes are open for the LoRA to learn the difference.
Hi, sorry for taking so long to respond. Here's the notebook: https://colab.research.google.com/drive/14BnL_yiyVGs8WFovba1t3UjcjnZiVrpi?usp=sharing
@drimeF0 I made a copy and modified it with the following changes:
I'm still running my intial test with three separate concepts on this modified version. An earlier attempt with only eyes_closed and one_eye_closed is still getting me boths eyes closed @@
https://colab.research.google.com/drive/1j8y5AvtdiXl4_8CHk3wd7xxre0u5Vsxf?usp=sharing
I'm getting the impression that this notebook is originally meant to run on Kraggle, since on Google Colab free tier, unless you are closely monitoring, all the work will be lost when Google happily d/c me
While I wasn't successful at training the concept (Which may require training the text encoder as well, and possibly better tagging of the dataset, maybe trying SDP instead of xformers, or just more training, or mentioned below, replacing the keyword tag to wink), I managed to hit step 579 (Epoch 3) without the model collasping.
As a matter of fact, it managed to maintain a very coherent style! (Not sure if this is the normal style expected from this model as I haven't really played with it before.
Epoch 1(step 193):
Epoch 2(step 386):
Epoch 3(step 579): image prompt: masterpiece, best quality, one eye closed, solo, 1girl --w 768 --h 1024 --d 123
At step 579, I also generated a couple more samples, and they still seem to carry the default styling. image prompt: masterpiece, best quality, one eye closed, solo, 1girl --w 1024 --h 1024
image prompt: masterpiece, best quality, one eye closed, solo, 1girl --w 832 --h 1216
I'm not likely to get more out of the training at this point due to running it on free colab, and while I can't think of how to better help you train the specific one eye closed concept (Hrm... maybe replace one eye closed with wink? might be a more natural concept to train), I hope that I've at least help you troubleshoot the model collapse issue.
While I wasn't successful at training the concept (Which may require training the text encoder as well, and possibly better tagging of the dataset, maybe trying SDP instead of xformers, or just more training, or mentioned below, replacing the keyword tag to wink), I managed to hit step 579 (Epoch 3) without the model collasping.
As a matter of fact, it managed to maintain a very coherent style! (Not sure if this is the normal style expected from this model as I haven't really played with it before.
Epoch 1(step 193):
Epoch 2(step 386):
Epoch 3(step 579): image prompt: masterpiece, best quality, one eye closed, solo, 1girl --w 768 --h 1024 --d 123
At step 579, I also generated a couple more samples, and they still seem to carry the default styling. image prompt: masterpiece, best quality, one eye closed, solo, 1girl --w 1024 --h 1024
image prompt: masterpiece, best quality, one eye closed, solo, 1girl --w 832 --h 1216
I'm not likely to get more out of the training at this point due to running it on free colab, and while I can't think of how to better help you train the specific one eye closed concept (Hrm... maybe replace one eye closed with wink? might be a more natural concept to train), I hope that I've at least help you troubleshoot the model collapse issue.
thank you very much for your help with the model collapse during training, I will later try to transfer the training code to kaggle and run the code on a lower lr for 5000 training steps, in addition, I will try various datasets
@drimeF0, I suspect that the root of the issue may have something to do with using steps instead of epoch/repeats for training. If I can make a suggestion, instead of using 5000 training steps, try increasing the number of epochs or repeats.
To make the changes, you can edit the --max_train_epochs "8" and --dataset_repeats "5"
I would recommend setting dataset_repeats to 10, and increasing the epochs until you get the results you desire.
@drimeF0, I suspect that the root of the issue may have something to do with using steps instead of epoch/repeats for training. If I can make a suggestion, instead of using 5000 training steps, try increasing the number of epochs or repeats.
To make the changes, you can edit the --max_train_epochs "8" and --dataset_repeats "5"
I would recommend setting dataset_repeats to 10, and increasing the epochs until you get the results you desire.
with learning_rate set to 0.0002 LoRA learns very well:
thank you again for your help with the LoRA model training script
I tried 10 times to train lore on Kaggle and google colab, and each time the training results were terrible even after 5000 training steps on 50 images. I use this sequence of commands:
I noticed this in the sdxl_train_network execution log:
Could this break LoRA? And if so, how to fix it?