New Feature: Vector Shuffling

TaleirOfDeynai commented 1 year ago

This is a back-port of a feature I created in my own fork which now has heavy refactors, but it was working so well, I felt I had to clean it up and offer it back to the main repo.

I've been working a lot with multi-vector embeddings lately.

One issue I've been having is what I had termed as "vector cross-talk". This is where you might be training an embedding on a character with blue eyes and red hair, but the AI sometimes generates imagery of the character with red eyes and blue hair, or blue eyes and blue hair, etc.

I suspected that the cross-talk is because vectors are encoding individual colors represented by the character and the AI sometimes just free associates them to other features of the embedding and/or prompt. So, what if the AI could not rely on vector ordering as much when learning?

This introduces a feature called vector shuffling, where the embeddings of individual vectors are shuffled randomly when they're inserted into the prompt during training. The idea is that it will force it to encode sharper and more focused concepts into each vector. No more blue AND eyes AND red AND hair. Because it's scrambled, it can only learn the concept correctly if it encodes the entirety of the concept in one embedding, IE blue eyes AND red hair.

This is probably why using only 1 vector got the best results in the study; it had no choice but to encode a focused embedding when it had only 1 vector. This feature tries to bridge the gap and give the best of both worlds: more storage for more complex subjects and focused vectors that resist leaking into other aspects of the prompt.

That's the theory, anyways. I'm no data scientist, so someone who is can take this and test these claims, but doing this seemed to cut down on vector cross-talk in my tests. It can still happen, but it significantly reduced occurrences and their severity when using the trained embedding.

You can enable this feature by setting model.params.personalization_config.params.shuffle_mode to true in your project's config YAML file, and it will shuffle all vectors.

Having done a lot of experimentation with the concept, it also supports the following additional options. Experiment with them and see what works best for your particular subject.

all or on or true - Shuffles all vectors.
off or false - Disables shuffling; the default if it is not specified in your config.
trailing - With 3 or more vectors, shuffle all vectors after the first. This provides a stable "intro" vector.
leading - With 3 or more vectors, shuffle all vectors before the last. This provides a stable "outro" vector.
between - With 4 or more vectors, shuffles all vectors between the first and last.
- This creates stable "intro" and "outro" vectors, allowing the embedding to take some advantage of token ordering during training. That is, if you use your embedding like a name, this preserves a chance of learning that intended usage and taking advantage of that. The idea is it will learn to use the first and last vectors as cues leading in and out of the embedding cleanly and minimize disruption to other parts of the prompt.
progressive - A special mode for progressive_words. Like between, it also establishes stable "intro" and "outro" vectors, but ensures that the first and last vectors are kept the same throughout training as more vectors are added.
- Needs num_vectors_per_token to be at least 3 to have any notable effect.
- Vectors between the first and last are still shuffled as in between, when there's enough vectors unlocked to do so.
- You will want to make sure that all vectors get some training (num_vectors_per_token * 2000 steps of training need to have occurred), otherwise some of the middle vectors may still be the initialization word's embedding.
dynamic - The "just make it work no matter what" option that favors stability. This tries to always shuffle the vectors but also tries to establish stable intro and outro vectors when the number of vectors permits it. What shuffle mode it ultimately uses differs based on how many vectors there are.
- With 1 vector, uses off.
- With 2 vectors, uses all.
- With 3 vectors, uses trailing to at least establish a stable intro vector.
- With 4 or more vectors, uses between to establish both intro and outro vectors.

If the number of vectors is below the supported number for an option, it acts the same as off unless otherwise noted.

The dynamic setting exists because some of my experiments allowed for different numbers of vectors for different placeholders when training multi-placeholder embeddings (like in the per_image_tokens mode). It's also just a good option to suggest when someone isn't sure which to use. It will try to provide as much benefit as it can regardless of what num_vectors_per_token is set to.

bonlime commented 1 year ago

+1 for this idea. For me random shuffling of all vectors with probability of 1 gives the best results.

isamu-isozaki commented 1 year ago

This is very interesting. It seems like diffusion models might be better with a bag of words like approach.

FurkanGozukara commented 1 year ago

this looks promising and should be implemented i think

rinongal / textual_inversion

New Feature: Vector Shuffling #119