klimaleksus / stable-diffusion-webui-embedding-merge

Extension for AUTOMATIC1111/stable-diffusion-webui for creating and merging Textual Inversion embeddings at runtime from string literals.
The Unlicense
110 stars 10 forks source link

Making embedding from prompt results in different results #4

Open jockisen opened 1 year ago

jockisen commented 1 year ago

First of all thanks for making such a great extension!

Second: when I try just to make an embedding from a prompt and negative prompt the results look different. It’s a pretty long regular prompt separated by commas as is the usual way.

aleksusklim commented 1 year ago

1) This extension may not be compatible with latest version of WebUI. I have to look into that. 2) Confirm that you don't use any special syntax, for example attention parenthesis, square brackets, words AND or BREAK – because they cannot make any sense inside an embedding (since they are applied at later step) and thus cannot be parsed except literally as characters.

jockisen commented 1 year ago
  • This extension may not be compatible with latest version of WebUI. I have to look into that.
  • Confirm that you don't use any special syntax, for example attention parenthesis, square brackets, words AND or BREAK – because they cannot make any sense inside an embedding (since they are applied at later step) and thus cannot be parsed except literally as characters.

Thank you replying!

The prompts contain words separated with commas and single spaces, like this:

a photo of a cat, beautiful, high res

That sort of composition. Not that wording ofc 😀

And I am not on the latest version of auto.

aleksusklim commented 1 year ago

Well, I cannot reproduce. Here are my steps:

00000-42

a photo of a cat, beautiful, high res
Negative prompt: blurry, cropped
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 42, Size: 512x512, Model hash: 58c5c27858, Model: suzumehachi_V10

00001-42

em_my_cat
Negative prompt: em_my_blur
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 42, Size: 512x512, Model hash: 58c5c27858, Model: suzumehachi_V10

They aren't "similar", they are exactly the same! What is your case, step-by-step?

jockisen commented 1 year ago

Well, I cannot reproduce. Here are my steps:

  • commit 22bcc7be428c94e9408f589966c2040187245d81 (pretty recent)
  • EmbeddingMerge works fine as it is (surprisingly for me that nothing broke it!)
  • I generate "a photo of a cat, beautiful, high res" with negative "blurry, cropped" at default parameters, fix the seed.
  • I go into EM and put "a photo of a cat, beautiful, high res", saving as "em_my_cat" for example.
  • I put "blurry, cropped" saving as "em_my_blur".
  • I return to txt2img and use em_my_cat as prompt and em_my_blur as negative.
  • Compare the results!

00000-42

a photo of a cat, beautiful, high res
Negative prompt: blurry, cropped
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 42, Size: 512x512, Model hash: 58c5c27858, Model: suzumehachi_V10

00001-42

em_my_cat
Negative prompt: em_my_blur
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 42, Size: 512x512, Model hash: 58c5c27858, Model: suzumehachi_V10

They aren't "similar", they are exactly the same! What is your case, step-by-step?

Thanks for running that test. I have a very long prompt for both regular and negative prompt. The regular one can’t be compressed into one embedding because it’s too long apparently. So I had to take away all commas to make that one.

Regular: beautiful award-winning portrait photo, pale washed out 90s style, clear eyes, Holga camera style, retro, dreamy, nostalgic, grainy, blurry, soft focus, vignetting, light leaks, distorted, imperfect, moody, artistic, painterly, ethereal, whimsical, plastic lens, low fidelity, medium format, square format, film photography, toy camera, manual focus, fixed aperture, bulb mode, multiple exposure, zone focusing, plastic body, limited control, unpredictable results, experimental, lo-fi

Negative: ugly cartoon drawing, blurry, blurry, blurry, blurry, hands, hands, hands, hands, double heads, deformities

In front of the regular prompt I use “a woman” or “a man” etc.

Question: can I split the regular prompt into several parts and make several embeddings that I then merge together and so get the same result as just using the prompt?

aleksusklim commented 1 year ago

I have a very long prompt for both regular and negative prompt.

WebUI implicitly splits large prompts at last comma before 75 tokens, and then combines all parts with invisible BREAK statement. To have more control on this splitting, you may as well use explicit BREAK before you reach 75 tokens.

So I had to take away all commas to make that one.

Why do you need to create embedding of the entire prompt in the first place?

split the regular prompt into several parts and make several embeddings that I then merge together

Merge or concatenate? To merge things, you either may use BREAK keyword, or EmbeddingMerge inline syntax <'first prompt'+'second prompt'>

get the same result as just using the prompt

I don't understand the reasons behind it, but yes, this is technically possible but makes little sense. When you have very large prompt which ends on "… vvv www, xxx yyy, zzz" – you ought to delete everything from the ending up to the point until the token counter would show 75/75. Then you need to cut this at the last comma. For example, let's assume that 75/75 is showed when your string is "… vvv www, xxx". Then, cut at the comma and get 73/75 as "… vvv www". This will be your first embedding, save it. Start from the rest of your prompt ("xxx yyy, zzz") and if it is also larger than 75 – do this again; otherwise, this will be your last embedding.

Then you should be able to use your new embeddings in one string, separated with spaces (not commas anymore). This is because if you split at wrong position in original prompt – then implicit BREAK would not be able to split your prompt inside the embedding!

For example, let's pretend that limit is 5 and not 75. If you have one two, four five, seven eight (where 3 and 6 are commas) It would be split by WebUI to one two BREAK four five, seven eight (not sure whether it deletes the comma when inserting break, or leaves it at the end of left part) If you add something short the left side, sometimes it will not shift the break position, for example zero one two BREAK four five, seven eight

If you try to create embedding from the whole prompt – it will fail. It also fails at the second comma. But it will succeed exactly in the middle, like one two, four five Then your second embedding will be seven eight At the end you will have: [one two, four five], [seven eight] WebUI has to put BREAK just after your large embedding, and also before it if you add anything to the left. And you will end up with zero BREAK [one two, four five] BREAK , [seven eight] – Which is not what you was getting without embeddings.

Thus, if you use large prompts – you need to split them manually in the same position where they are cut internally. I can recommend abandoning large prompts with implicit breaking, and using explicit BREAK each time you hit 75 limit. This way you can reserve some space (for example, breaking at 60 or 50) so your added words won't shift the slice and thus not drastically modify the whole composition.

Personally, I use EmbeddingMerge to somewhat shorten my large prompt, for example typing <'white'+'short'> dress instead of white short dress or white dress, short dress (sometimes it works as I want, sometimes it's not…) But I never convert to embeddings whole prompt because it doesn't give any benefits while hiding things from generation info! (I mean, you won't be able to recover what words you had in your embedding, since it will be seen as external, not as created by EM on the fly as in <''>-syntax).