nicolai256 / Stable-textual-inversion_win

MIT License
241 stars 43 forks source link

Question regarding embeddings trained with multiple placeholders_strings/initializer_words. #24

Open rabidcopy opened 2 years ago

rabidcopy commented 2 years ago

This is a bit of an odd case but I was wondering if there is any means to take an embeddings.pt with multiple placeholders and merge or otherwise consolidate them into a single placeholder? The problem I'm facing is that while training results look promising with the use of multiple placeholders and initializer_words opposed to a single placeholder and init_word, the embeddings.pt produced don't seem to work well with a fork of Stable Diffusion I'm using located here https://github.com/AUTOMATIC1111/stable-diffusion-webui. The repo offers support of embeddings loaded through an embeddings folder with .pt files renamed to the word/phrase you want to type into a prompt to use them as a textual inversion. Example, renaming an embeddings.pt to test.pt, placing it in the embeddings folder, and then submitting a prompt like "a photo of test". And that's all well and good and it works, except for when there are multiple placeholders within the embedding. The readme states "They must be either .pt or .bin files, each with only one trained embedding, and the filename (without .pt/.bin) will be the term you'll use in the prompt to get that embedding." and I can assume what that means is by one embedding, that means one placeholder_string. Anytime I try to use an embedding with multiple placeholders, results are either unexpected, nonexistent, or completely garbled and produces junk. From my understanding the merge_embeddings.py script is intended to merge 2 embedding files with their individual placeholders intact or renamed if there is a conflict.

So that's about it, my apologies for the wall of text. Greatly appreciate any insight.

Edit: Giving it some more thought, would it be possible to instead split each placeholder into its own embeddings.pt?

nicolai256 commented 2 years ago

This is a bit of an odd case but I was wondering if there is any means to take an embeddings.pt with multiple placeholders and merge or otherwise consolidate them into a single placeholder? The problem I'm facing is that while training results look promising with the use of multiple placeholders and initializer_words opposed to a single placeholder and init_word, the embeddings.pt produced don't seem to work well with a fork of Stable Diffusion I'm using located here https://github.com/AUTOMATIC1111/stable-diffusion-webui. The repo offers support of embeddings loaded through an embeddings folder with .pt files renamed to the word/phrase you want to type into a prompt to use them as a textual inversion. Example, renaming an embeddings.pt to test.pt, placing it in the embeddings folder, and then submitting a prompt like "a photo of test". And that's all well and good and it works, except for when there are multiple placeholders within the embedding. The readme states "They must be either .pt or .bin files, each with only one trained embedding, and the filename (without .pt/.bin) will be the term you'll use in the prompt to get that embedding." and I can assume what that means is by one embedding, that means one placeholder_string. Anytime I try to use an embedding with multiple placeholders, results are either unexpected, nonexistent, or completely garbled and produces junk. From my understanding the merge_embeddings.py script is intended to merge 2 embedding files with their individual placeholders intact or renamed if there is a conflict.

So that's about it, my apologies for the wall of text. Greatly appreciate any insight.

Edit: Giving it some more thought, would it be possible to instead split each placeholder into its own embeddings.pt?

yeah u can use merge_embeddings.py to merge embeddings files, not sure why you'l want to split them again unless you've deleted the original files?

rabidcopy commented 2 years ago

This is a bit of an odd case but I was wondering if there is any means to take an embeddings.pt with multiple placeholders and merge or otherwise consolidate them into a single placeholder? The problem I'm facing is that while training results look promising with the use of multiple placeholders and initializer_words opposed to a single placeholder and init_word, the embeddings.pt produced don't seem to work well with a fork of Stable Diffusion I'm using located here https://github.com/AUTOMATIC1111/stable-diffusion-webui. The repo offers support of embeddings loaded through an embeddings folder with .pt files renamed to the word/phrase you want to type into a prompt to use them as a textual inversion. Example, renaming an embeddings.pt to test.pt, placing it in the embeddings folder, and then submitting a prompt like "a photo of test". And that's all well and good and it works, except for when there are multiple placeholders within the embedding. The readme states "They must be either .pt or .bin files, each with only one trained embedding, and the filename (without .pt/.bin) will be the term you'll use in the prompt to get that embedding." and I can assume what that means is by one embedding, that means one placeholder_string. Anytime I try to use an embedding with multiple placeholders, results are either unexpected, nonexistent, or completely garbled and produces junk. From my understanding the merge_embeddings.py script is intended to merge 2 embedding files with their individual placeholders intact or renamed if there is a conflict. So that's about it, my apologies for the wall of text. Greatly appreciate any insight. Edit: Giving it some more thought, would it be possible to instead split each placeholder into its own embeddings.pt?

yeah u can use merge_embeddings.py to merge embeddings files, not sure why you'l want to split them again unless you've deleted the original files?

I think I might have been misunderstood. I'm not trying to merge placeholders from multiple embedding files into a single embedding file with multiple placeholders. I'm trying to make 3 placeholders within one embedding into one placeholder. To go from an embeddings.pt with these placeholders present = "*,@,#" To an embeddings.pt with one placeholder present = "-" (basically "*" + "@" + "#" = "-") From my understanding that's not what merge_embeddings.py does or is capable of doing. I don't have files with single placeholders because I used multiple placeholders/initializers for training. So that's why I asked if it was possible to split the placeholders from a file that was trained with multiple placeholders. And then I know the question is why would I want that. You would typically want one embeddings.pt to load with stable_txt2img.py and then call placeholders in a prompt. However https://github.com/AUTOMATIC1111/stable-diffusion-webui handles embeddings differently. Instead of loading a single embedding.pt, it allows you to load multiple embeddings.pt in a folder and you can call them in a prompt by their filename. The limitation is that each embeddings.pt you place in the folder can only contain a single placeholder. With stable_txt2img.py I could just write a prompt like "a photo of *,@,#" , however in stable-diffusion-webui, I would want to have each placeholder split up into its own embeddings.pt like this placeholder1.pt placeholder2.pt, placeholder 3.pt. And then I could use a prompt like this, "a photo of a placeholder1,placeholder2,placeholder3".

CodeExplode commented 2 years ago

It's definitely doable, but will need new code I think. I think what you want is to join the vectors of your embeddings together, which would be easy for somebody who knows how to join torch tensors, which unfortunately isn't the same as joining lists in almost any other programming context and isn't something I can currently do (though am meaning to try for my own reasons).

When you send text to Stable Diffusion, each word is looked up in a dictionary and has a vector of weights (or sometimes more than one) associated with it (i.e. a long list of decimal numbers from -1 to +1). Those are what is sent to stable diffusion's model as the input. Textual inversion finds a new vector of weights as the input to send, one which didn't exist as a word in the dictionary, and your embedding symbol is just used as a hint for the code to get the vector from your saved file instead of from the pre-saved dictionary it normally uses.

All the vectors are just sent in sequence to stable diffusion's model (which I think takes 77 vectors max). Since embeddings can already be multiple vectors long, there's no reason you couldn't join the vectors from multiple embeddings together as one new embedding, and it would presumably be the same as just using them in sequence in your prompt. i.e. "A picture of * @ #" would work the same as "A picture of -", the sequence of vectors sent to SD's model would still be the same.

Or if you wanted to split your multi-symbol embedding file into individual files with their own unique vectors, that's definitely doable too, and would be easier to do from the current merge embeddings code.