hollowstrawberry / kohya-colab

Accessible Google Colab notebooks for Stable Diffusion Lora training, based on the work of kohya-ss and Linaqruf
GNU General Public License v3.0
592 stars 83 forks source link

Images and Captions in multiple folders, can I train all of them at once? #195

Open thegoldenboy542 opened 3 weeks ago

thegoldenboy542 commented 3 weeks ago

I have three folders with images and captions.

My Drive/Loras/Example/dataset/dataset My Drive/Loras/Example/dataset/dataset/dataset My Drive/Loras/Example/dataset/dataset/dataset

Each folder contains the same images but different captions for each image (1 with blip, 1 with wd, and the other with tags from the site) I know the lora trainer can train on multiple folders, at least I think so? I just don't quite understand how the whole [dataset.subset] stuff works, I don't want to use any of them as reg images btw.

uYouUs commented 3 weeks ago

like you mentioned, there is a section for it like so: image

This is just an example placeholder, and can be changed to the location of your folders as well as the amount of repeats you would like per folder

Here is another example from one I've done: image Once setup, you click the run button on that on the top left of that section. After that you can run the main section to start training.

If you would like to disable using multiple folders, you can do so by running the section under it image

Hope this answers your question.

thegoldenboy542 commented 3 weeks ago

Just a few questions, do the num_repeats for each folder have to be different or can they be the same? Does it have an impact on training?

EDIT: Also if each folder contains 2279 images how many repeats, steps and epochs are best?

uYouUs commented 3 weeks ago
  1. You can the same number of repeats. Thats up to you, however in that case you could probably just have them in the same folder. If its a naming issue, know that this sometimes still affects the trainer, you might get an error saying 1.jpg exists twice, even if in different folders.

  2. I personally dont worry too much about steps. The other two are pretty important. For a standard lora I try to aim to be as close to 400 total. I aim for above 300 but i stay below 430. Some examples: 40 images x 10 repeats = 400. 130 images x 3 repeats = 400. 100 images x 4 repeats = 400. 150 images x 2 = 300. as 150x3=450 this is too much for me. 90 images x 4 = 360.

and for a multifolder setup image I had 5 images in best with 5 repeats for 25 40 images in good with 5 repeats for 200 150 images in the normal with 1 repeat for 150 total after repeats = 25 + 200 +150 = 375 You should see that in the output when running the trainer, like so: image

  1. Now for epochs. This is based on how much data is being learned. You need to know a few things to understand this. The two big settings are the dimensions and the learning rate. Both of these settings change how fast the data gets learned. in SD1.5, its roughly 1MB per dimension set, so roughly 16MB for a 16 dim lora. If you are training SDXL, 8 dim is 56MB, so even if 8 dim is smaller than 16, the actual size is bigger in SDXL. The alpha should be half of the dimension, so 16 dim 8 alpha, or 8 dim 4 alpha. For most characters SDXL 8 dim works amazingly well, no need for more. For SD1.5, since they hold less data, you might want more. 32 dim 16 alpha was more than enough for any character SD1.5.

Essentially, the dim is the "brain" size, bigger brain = smarter and learns faster. Its also needed for more complex things. second thing that matters is the learning rate. This is how big of a change the adjustments to the lora are. Higher learn rate means you adjust faster.

Now we put them together. Depending on what both of those settings are set to, the lora will learn from the material at a certain rate. If both of those are too high, it will overfit and produce useless results. in the trainer, you will see the Loss increase or go to nan, like so: image The resulting lora from that epoch will produce useless results or black screens.

In this case, I ran for 10 epochs but as you can see it overfitted since epoch 5. You can check the previous 2 epochs as the one before is sometimes also overfitted even if it does not say "nan". If one of the previous ones is good and you like it, then you are done. However if you did not get good results, the solution is to lower the training speed by either lowering the dim or the learning rate to spread out the learning into more epochs. If on the other hand you did something like 10 epochs and the results are weak and not quite at the level you wanted, it lacks training which can be achived in the form of either more speed(dim size/learning rate) or a longer session(more epochs).

Because Dim size increases actual file sizes, I prefer to increase the learning rate first, over increasing the dim size if the lora is not training well within the epochs you are targetting(since it has to actually run, you probably also dont want too many epochs, as that takes longer/more resources).

With all of that said. If you have 2279 images. This is above the 400 images youd want for a standard lora, as such you do not need repeats. This also means you will probably want a larger Dim size for the lora. It will also be very resource intensive and take a while. I would stick to only 2-4 epochs for that.