ljleb / prompt-fusion-extension

auto1111 webui extension for all sorts of prompt interpolations!
MIT License
259 stars 16 forks source link

Include prompt blending #75

Open SirVeggie opened 8 months ago

SirVeggie commented 8 months ago

Hey there, I was wondering if it would be a good idea to include prompt blending into this extension? The old script is no longer being developed and it no longer works with the newest version of auto1111, but I did get it working with a small tweak.

I thought it is a very similar functionality to this extension and it would be convenient to have both in one extension. It might need a bit of tweaking for the syntax to not clash with dynamic prompts, and to fit well with the existing syntax of this extension.

What do you think?

ljleb commented 8 months ago

You might want to know that there is a new prompt blending extension that makes possible to do custom blends with a more intuitive syntax: https://github.com/klimaleksus/stable-diffusion-webui-embedding-merge

This seems to match your description, but please let me know if it's different and I can start looking for a hole in the syntax where this would go.

SirVeggie commented 8 months ago

I've used that extension for a long time as well but the prompt blending script does a much better job at actually blending stuff together for some reason. I'm not really familiar with the internal logic so can't say why that would be.

ljleb commented 8 months ago

I'll look into this then. The worst case scenario is that we are introducing feature redundancy I guess.

I'm thinking of using something like this:

[a : b : c : 0.33, 0.33, 0.33 : mean]

Where a, b, c are prompts and 0.33 are the respective weights of the average. Calling it "mean" is slightly wrong, it should technically be "weighted sum" but "mean" makes for a shorter command.

You'd be able to infer equal weights summing up to 1 like this:

[a : b : c : d : e : , , , , : mean]

Having to put the right number of commas can be annoying. I'll update you on whether I sucessfully make the parser work with this:

[a : b : c : d : e :: mean]
SirVeggie commented 8 months ago

Looks good. Being able to omit the commas when possible would be a life saver. Are the weights relative to each other? I assume [a : b : c : 1, 1, 1 : mean] would be the same as using 0.33.

Other keyword alternatives could be the shortened avg or using static to highlight the difference to the other features in the extension.

ljleb commented 8 months ago

Are the weights relative to each other?

That's a good question. It would come in handy for most cases, but at the same time it would prevent using a weight combination that sums to i.e. 1.1 or 0.9, which may or may not be interesting to play with. I'll think about this, there may be a way to allow both rescaled weighting and absolute weighting.

ljleb commented 8 months ago

I could try to change the parser so that we can put many words in the last : ... ]. Then, we could use mean absolute or absolute mean to prevent weight normalization.

SirVeggie commented 8 months ago

Allowing both could be achieved in a few ways: Hijacking prompt weight for this specific case: ([a : b : c : 1, 1, 1 : mean]:1.1) -> relative, but scaled so that the parts sum to 1.1 Optional extra number: [a : b : c : 1, 1, 1, 1.1 : mean] Option 1 is clearer in syntax, but causes behaviour that is slightly unexpected, since it's not the same as native weight. Option 2 looks more confusing but doesn't mess with the native weighting.

Another option could be to add the total weight as an extra after the keyword [a : b : c : 1, 1, 1 : mean : 1.1]
This approach is clearer to the eye and doesn't mess with the native weights.

SirVeggie commented 8 months ago

I like your approach.

ljleb commented 8 months ago

Before anything, thanks for the suggestions!

I think option 1 would introduce behavior that is a bit surprizing. The syntax looks the same as builtin token weighting but has a completely different behavior.

Option 2 could work I think, however it is hard to know from a glance whether or not there is an extra number in the list.

Option 3 unfortunately can't work, because the parser has to see a word in the last : ... ] to detect a prompt interpolation construct, which is different from prompt weighting [ ... ] or prompt editing [ ... : ... : num ].

Thanks for these though, I really appreciate your input and the intention to make this feature as good as it can be.

ljleb commented 8 months ago

An alternative to 3 could be to use something like

[a : b : c : 1, 1, 1 : mean(1.2)]

which might be more useful than absolute weighting.

SirVeggie commented 8 months ago

Thanks for these though, I really appreciate your input and the intention to make this feature as good as it can be.

I intend to use it so I'm fully driving my own benefit here 😎

SirVeggie commented 8 months ago

I think after all using mean absolute or mean-abs or whichever variation along with absolute numbers makes the most sense.

ljleb commented 8 months ago

We can have both simultaneously. I can make the parentheses optional, so that both mean and mean(1) have the same effect, and then also have mean absolute or absolute mean for the other case. In the absolute case, the parentheses could be ignored.

In any case, I'll start with the base case of normalized average with the keyword mean and then build on top with follow up PRs. It's easy to fall into the over engineering trap of implementing too many cases and this will take forever if I try to do it all in one shot.

ljleb commented 8 months ago

I've implemented the basic version of weighted average. It should work exactly as we described it earlier here. Let me know if you have any issue with the implementation and I'll look into it. I'm not closing this yet because absolute mean and mean(n) have not been implemented.

SirVeggie commented 8 months ago

It seems like using weights does not work yet. Neither [cat : frog : 1, 1 : mean] or [cat : frog : 0.5, 0.5 : mean] yield the same result as [cat : frog :: mean]. I'm not sure if you only implemented the base case of :: or if there is a bug.

[cat : frog :: mean] does yield the correct result though 👍

ljleb commented 8 months ago

Unfortunately I cannot reproduce the issue. All prompts generate exactly the same output.

[cat:dog::mean]
Negative prompt: mediocre ugly abstract art [by xynon-bad-11k-2 : : , 0.2]
Steps: 24, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 4188154356, Size: 512x512, Model hash: e16273c410, Model: tmndMix_tmndMixVIIBakedvae, VAE hash: e169f60bfa, VAE: kl-f8-anime2.vae.pt, FreeU Stages: "[{\"backbone_factor\": 1.2, \"skip_factor\": 0.9}, {\"backbone_factor\": 1.4, \"skip_factor\": 0.2}]", FreeU Schedule: "0.0, 1.0, 0.0", FreeU Version: 2, CFG Rescale phi: 0, TI hashes: "xynon-bad-11k-2: 796c84e2a27c", Script: X/Y/Z plot, X Type: Prompt S/R, X Values: "::,\":1,1:\",\":0.5,0.5:\"", Version: v1.6.0

xyz_grid-0003-4188154356- cat_dog__mean

SirVeggie commented 8 months ago

Interesting, I'll try to investigate a bit more.

ljleb commented 8 months ago

Maybe it's caused by an interaction with another extension?

SirVeggie commented 8 months ago

Most likely that would be it.

SirVeggie commented 8 months ago

Found the offending extension. After disabling it the feature seems to work great.

ljleb commented 8 months ago

Awesome. Which one is it? Maybe there's a way to make them compatible.

SirVeggie commented 7 months ago

Forgot to respond to this. The extension was this. It replaces spaces and commas, so the syntax of this extension was replaced. I simply disabled the other extension as it's not certain if it is even effective or just placebo. It would be difficult to make then compatible and not worth the effort.

Hellisotherpeople commented 4 months ago

Huge thank you btw for implementing this! I'm convinced that techniques for representation/activation engineering like prompt blending are being massively undervalued by the AI community.