Open ddpasa opened 8 months ago
Thank you for the suggestion.
I am not certain that changing captions randomly can increase the learning rate of Text Encoder. In addition, I think the caption dropping (with caption_tag_dropout_rate
option) may have similar effect.
However, this feature may be effective and I will consider implementing it if there are many who wish to have it :)
Caption dropout is helpful, especially if you are using wd14 tags. But it's less helpful in the case of full sentences.
I'm in testing this method for dynamic captions, consisting of ["booru-based tags"], "[full sentence caption 1]" , [whatever caption form], ... But this requires to reform the structure of ImageInfo class. After tests done, I will pr.
You might try my PR to achieve the same goal.
I originally opened this issue in the bmaltais repo, but this is probably the correct place: https://github.com/bmaltais/kohya_ss/issues/1836
Overtraining the text encoder is a real problem. If you look at online guides (such as those at civitai), they recommend using lower learning rates for the TE or not training it at all.
One really cool feature could be alternating captions for each image from a set. There should be an option to treat the captions files as a set of possible captions for the image (one possible caption per line), and a caption should randomly be selected every time an image is used for training.
For example, when training a character Lora, the caption file could look like:
a photo of name a photo of name, a woman name this picture shows name .....
Each time one of those captions would be randomly selected. This way we can get greater diversity in captions, which should make the TE training more robust.
I believe the Embeddings training script in Automatic1111 does something like this.