kohya-ss / sd-scripts

Apache License 2.0
5.05k stars 845 forks source link

Alternating Captions #1072

Open ddpasa opened 8 months ago

ddpasa commented 8 months ago

I originally opened this issue in the bmaltais repo, but this is probably the correct place: https://github.com/bmaltais/kohya_ss/issues/1836

Overtraining the text encoder is a real problem. If you look at online guides (such as those at civitai), they recommend using lower learning rates for the TE or not training it at all.

One really cool feature could be alternating captions for each image from a set. There should be an option to treat the captions files as a set of possible captions for the image (one possible caption per line), and a caption should randomly be selected every time an image is used for training.

For example, when training a character Lora, the caption file could look like:

a photo of name a photo of name, a woman name this picture shows name .....

Each time one of those captions would be randomly selected. This way we can get greater diversity in captions, which should make the TE training more robust.

I believe the Embeddings training script in Automatic1111 does something like this.

kohya-ss commented 8 months ago

Thank you for the suggestion.

I am not certain that changing captions randomly can increase the learning rate of Text Encoder. In addition, I think the caption dropping (with caption_tag_dropout_rate option) may have similar effect.

However, this feature may be effective and I will consider implementing it if there are many who wish to have it :)

ddpasa commented 8 months ago

Caption dropout is helpful, especially if you are using wd14 tags. But it's less helpful in the case of full sentences.

BootsofLagrangian commented 8 months ago

I'm in testing this method for dynamic captions, consisting of ["booru-based tags"], "[full sentence caption 1]" , [whatever caption form], ... But this requires to reform the structure of ImageInfo class. After tests done, I will pr.

gesen2egee commented 8 months ago

You might try my PR to achieve the same goal.

https://github.com/kohya-ss/sd-scripts/pull/1106