rinongal / textual_inversion

MIT License
2.87k stars 278 forks source link

New Feature: Vector Shuffling #119

Open TaleirOfDeynai opened 1 year ago

TaleirOfDeynai commented 1 year ago

This is a back-port of a feature I created in my own fork which now has heavy refactors, but it was working so well, I felt I had to clean it up and offer it back to the main repo.

I've been working a lot with multi-vector embeddings lately.

One issue I've been having is what I had termed as "vector cross-talk". This is where you might be training an embedding on a character with blue eyes and red hair, but the AI sometimes generates imagery of the character with red eyes and blue hair, or blue eyes and blue hair, etc.

I suspected that the cross-talk is because vectors are encoding individual colors represented by the character and the AI sometimes just free associates them to other features of the embedding and/or prompt. So, what if the AI could not rely on vector ordering as much when learning?

This introduces a feature called vector shuffling, where the embeddings of individual vectors are shuffled randomly when they're inserted into the prompt during training. The idea is that it will force it to encode sharper and more focused concepts into each vector. No more blue AND eyes AND red AND hair. Because it's scrambled, it can only learn the concept correctly if it encodes the entirety of the concept in one embedding, IE blue eyes AND red hair.

This is probably why using only 1 vector got the best results in the study; it had no choice but to encode a focused embedding when it had only 1 vector. This feature tries to bridge the gap and give the best of both worlds: more storage for more complex subjects and focused vectors that resist leaking into other aspects of the prompt.

That's the theory, anyways. I'm no data scientist, so someone who is can take this and test these claims, but doing this seemed to cut down on vector cross-talk in my tests. It can still happen, but it significantly reduced occurrences and their severity when using the trained embedding.

You can enable this feature by setting model.params.personalization_config.params.shuffle_mode to true in your project's config YAML file, and it will shuffle all vectors.

Having done a lot of experimentation with the concept, it also supports the following additional options. Experiment with them and see what works best for your particular subject.

If the number of vectors is below the supported number for an option, it acts the same as off unless otherwise noted.

The dynamic setting exists because some of my experiments allowed for different numbers of vectors for different placeholders when training multi-placeholder embeddings (like in the per_image_tokens mode). It's also just a good option to suggest when someone isn't sure which to use. It will try to provide as much benefit as it can regardless of what num_vectors_per_token is set to.

bonlime commented 1 year ago

+1 for this idea. For me random shuffling of all vectors with probability of 1 gives the best results.

isamu-isozaki commented 1 year ago

This is very interesting. It seems like diffusion models might be better with a bag of words like approach.

FurkanGozukara commented 1 year ago

this looks promising and should be implemented i think