del - Githubissues

Is it possible to train a LoRA together with an Embedding? Here are some thoughts that came to this, when training a LoRA for an object:

Training the entire CLIP is wrong. It is best left frozen.
Without learnable CLIP, we cannot change the meaning of words.
With or without learned CLIP, given a prompt "a photo of sks in the forest" – why would LoRA learn sks but not learn photo and forest along?
Generally, I do not want to learn anything except my token.
You could say "just use TI then!", but Embeddings are weak at learning complex concepts.
You could say "use regularization then!", but in this case there is no "class word" (and I don't want to introduce it); making regularization against "forest" and anything I might have in descriptions – feels wrong.
If it would be possible to use a learnable embedding in place of chosen token ("sks", possibly initialized with class word), then it would be more correct, because the object would be clearly stored inside this embedding and not in any other word.
LoRA general training should help the embedding to reach its target quicker. It's a compromise between training the entire CLIP or not at all.
Learning rate for the embedding should be set differently than learning rate for U-Net (or for CLIP if needed), because the best speed is yet to be discovered.

What do you think? Otherwise, I'm not quite sure how to train LoRA on something that is not a character nor a style. For example, to train a LoRA for "scar" concept: what descriptions should we choose? Should we say "sks over eye, 1boy, …"? If so, isn't it more logical to say directly "scar over eye, 1boy, …"? But if so, how can we be sure that only the concept of "scar" would be changed, and not the concept of "1boy"?

hnmr293 / sd-webui-cutoff

del #25