ljleb / sd-webui-neutral-prompt

Collision-free AND keywords for a1111 webui!
MIT License
186 stars 13 forks source link

[Questions] Regarding syntax: `:-1`, nesting prompts, precedence, formatting, etc. #70

Closed God-damnit-all closed 4 months ago

God-damnit-all commented 4 months ago

Question 1

What exactly is the :-1 in the examples? I see this same syntax used in the Composable Diffusion section of the A1111 wiki, but never is a negative number used there. How can a prompt have negative weight? Or is this something that affects CFG Scale?

Question 2

How does the syntax mentioned in the first question apply if I want to use it on a nested prompt, is it like this? (fullwidth characters used for emphasis)

magical tree forests, eternal city AND_PERP [  electrical pole voyage  AND_SALT small nocturne companion ] :1.5 AND_SALT [  electrical tornado  AND_SALT electric arcs, bzzz, sparks ]

Question 3

Assuming the syntax in Question 2 is valid, how exactly is it split up? i.e. Is it like ...

Or is it ...

NOTE: I'm assuming that the line breaks aren't required for nesting prompts and they're just for nicer formatting, let me know if this is not the case. If the line breaks are required, pretend that I added them in the above. (Also, if line breaks are required, is the indentation required as well?)

Question 4

How exactly are these prompt segments divvied up, is it like ...

Or is it ...

Question 5

Within nesting prompts, are these other syntaxes still valid?

magical tree forests, eternal city
AND_PERP [
    [electrical pole voyage:0.1]
    AND_SALT [[[small]]] nocturne [companion:ally:0.2]
] :1.5
AND_SALT [
    [electrical tornado::0.5]
    AND_SALT [electric arcs|bzzz|sparks]
]

(This uses the following: Prompt Editing, Attention/Emphasis, Alternating Words)

If valid, would it still be valid without the line breaks?

magical tree forests, eternal city AND_PERP [ [electrical pole voyage:0.1] AND_SALT [ [[small]]] nocturne [companion:ally:0.2] ] :1.5 AND_SALT [ [electrical tornado::0.5] AND_SALT [electric arcs|bzzz|sparks] ]

Question 6

Regarding the following quote in README.md ...

image

Was this a mistake? This is the first time I've heard of commas being an alias for AND.

If it's not a mistake, does this mean that electric arcs, bzzz, sparks is equivalent to electric arcs AND bzzz AND sparks? And if so, are AND_SALT electric arcs, bzzz, sparks & AND_SALT electric arcs AND bzzz AND sparks equivalent as well?

God-damnit-all commented 4 months ago

Question 7

This may or may not be relevant depending on what the answer to the previous question is, but is the nesting prompt syntax valid for the basic AND prompt, e.g. AND [ electric arcs AND_SALT bzzz ] ?

ljleb commented 4 months ago

What exactly is the :-1 in the examples? I see this same syntax used in the Composable Diffusion section of the A1111 wiki, but never is a negative number used there. How can a prompt have negative weight? Or is this something that affects CFG Scale?

A negative weight inverts the effect of the prompt. instead of adding the prompt, it is removed from the generated result. this is in fact already supported by a1111 without the extension, it's just that some of the prompt features are useful when a negative weight is applied. The Perp-Neg paper (first line of the readme) notably is implemented using a negative weight to an AND_PERP keyword.

Technically, the weight is applied on latent map deltas (positive_map - negative_map) * scale. Every latent map associated to each prompt is combined in delta space. Then, the last thing the extension does is add back the negative map associated to the negative prompt to turn the result back to an actual latent map, which can then eventually be passed to the VAE for example.

How does the syntax mentioned in the first question apply if I want to use it on a nested prompt, is it like this? (fullwidth characters used for emphasis)

magical tree forests, eternal city AND_PERP [  electrical pole voyage  AND_SALT small nocturne companion ] :1.5 AND_SALT [  electrical tornado  AND_SALT electric arcs, bzzz, sparks ]

Neutral-prompt will first compute whatever is enclosed between brackets [ ... ] as if it was a single prompt, and then that prompt will have a weight of 1.5. Technically, all of this happens at the latent level:

  1. electrical pole voyage gives you one latent map delta
  2. AND_SALT small nocturne companion gives you another latent map delta
  3. everything enclosed in [ ... ] will combine both latent map deltas above into a single one, using the keywords to determine how to do it
  4. this latent map delta is scaled by 1.5
  5. it is then combined further down the tree as if it had always been a single latent map delta from the beginning

Assuming the syntax in Question 2 is valid, how exactly is it split up? i.e. Is it like ...

All of this is affected by :1.5
    magical tree forests, eternal city AND_PERP [ electrical pole voyage AND_SALT small nocturne companion ]

Or is it ...

Only this is affected by :1.5
    AND_PERP [ electrical pole voyage AND_SALT small nocturne companion ]

NOTE: I'm assuming that the line breaks aren't required for nesting prompts and they're just for nicer formatting, let me know if this is not the case. If the line breaks are required, pretend that I added them in the above. (Also, if line breaks are required, is the indentation required as well?)

It is the second. :weight always only ties to the nearest prompt appearing right before. It does not apply to any other prompt before or after. You are correct in your assumption that newlines are irrelevant to the parser, you could put everything on the same line and get the same result. In other words, [ ... ] acts as if it was a single prompt, the rest of the syntax like :weight or AND_* keywords have the same parsing rules as when the extension is disabled.

How exactly are these prompt segments divvied up, is it like ...

All of this is affected by AND_SALT [ electrical tornado AND_SALT electric arcs, bzzz, sparks ]
    magical tree forests, eternal city AND_PERP [ electrical pole voyage AND_SALT small nocturne companion ] :1.5

Or is it ...

Only this is affected by AND_SALT [ electrical tornado AND_SALT electric arcs, bzzz, sparks ]
    AND_PERP [ electrical pole voyage AND_SALT small nocturne companion ]

Actually, it's neither. I can see how this could be confusing. How the prompt language works is, you have two categories of prompts:

First, the extension calculates the normal AND prompts and combines them into a single latent map delta, just like the webui would normally do it. Then, it goes on to calculate every auxiliary latent map deltas from each auxiliary prompt. Then, it combines everything at that level using the appropriate strategies (determined by the AND_* keyword prefixing each prompt) and keeps going down to the next prompt group [ ... ] or the top level prompt group.

This is because auxiliary prompts all rely on AND prompts declared at their level to do their work. For example, in

a catgirl AND_PERP a furry :-1

the latent map delta of a furry uses the latent map delta of a catgirl to determine the perpendicular component at each step. This is how the extension gives you control over the perpendicular component used to orthogonalize a latent map delta.

Within nesting prompts, are these other syntaxes still valid?

magical tree forests, eternal city AND_PERP [ [electrical pole voyage:0.1] AND_SALT [[[small]]] nocturne [companion:ally:0.2] ] :1.5 AND_SALT [ [electrical tornado::0.5] AND_SALT [electric arcs|bzzz|sparks] ]

(This uses the following: Prompt Editing, Attention/Emphasis, Alternating Words)

If valid, would it still be valid without the line breaks?

magical tree forests, eternal city AND_PERP [ [electrical pole voyage:0.1] AND_SALT [ [[small]]] nocturne [companion:ally:0.2] ] :1.5 AND_SALT [ [electrical tornado::0.5] AND_SALT [electric arcs|bzzz|sparks] ]

Yes. The way the extension parses prompt groups [ ... ] is by looking for capitalized AND_* keywords inside (either AND or any other auxiliary AND_* keywords). Any square brackets enclosing normal text without any AND_* keyword, for example

[a [b : c : 0.5]]

will not be tempered with by the extension and will have the original meaning of a [b : c : 0.5] with emphasis 1/1.1 and [b : c : 0.5] as prompt editing. You can even use extensions such as prompt-fusion, and it will also keep working:

[a | b | [c:d:e:3,5,8] : 0.25 ]

Again, whitespace is not significant.

Regarding the following quote in README.md ... image Was this a mistake? This is the first time I've heard of commas being an alias for AND.

If it's not a mistake, does this mean that electric arcs, bzzz, sparks is equivalent to electric arcs AND bzzz AND sparks? And if so, are AND_SALT electric arcs, bzzz, sparks & AND_SALT electric arcs AND bzzz AND sparks equivalent as well?

I think you misinterpreted the prompt, which contains literally magical tree forests, eternal city. The first prompt in the positive textbox is always an AND prompt (as I explained earlier for question 4), which is why it says that it's just that text for the root prompt group. No, commas are not replaced for AND keywords.

This may or may not be relevant depending on what the answer to the previous question is, but is the nesting prompt syntax valid for the basic AND prompt, e.g. AND [ electric arcs AND_SALT bzzz ] ?

It is still valid, yes. This is useful in case you want to contribute auxiliary prompt effects into a normal AND prompt, which you can then use later for example as the base for orthogonalization by a different prompt, i.e. [a AND_SALT b AND_SALT c] AND_PERP d :-1.

I appreciate your questions, they are very appropriate. If I haven't successfully cleared up everything yet, please feel free to let me know and I'll do my best to help.

God-damnit-all commented 4 months ago

I'm still trying to wrap my head around the way the precedence works, but that aside, I am particularly confused by one thing:

[a AND_SALT b AND_SALT c] AND_PERP d :-1

This makes me think of all sorts of ways that the syntax for decreasing emphasis could potentially break things. Is it because there's an AND statement inside that it's not interpreted that way? Or does it have to do with the spaces? A1111's documentation doesn't make it clear if you can surround space-separated words in the various syntaxes for emphasis.

As an aside, I do have a question not related to Neutral Prompts at all, I hope you'll forgive me for asking: I've heard mixed things about how negative prompts work, some say that spaces don't matter and each word is interpreted individually, some day that you can just replace the space with an underscore or even escape the whitespace with a backslash for the same effect. Others say that the way negative prompts work changed at some point and now it's just comma separated. How does it really work, and is there any special statement normally used in positive prompts that can be used in negative prompts for some reason (maybe even ones from certain extensions)?

ljleb commented 4 months ago

This makes me think of all sorts of ways that the syntax for decreasing emphasis could potentially break things. Is it because there's an AND statement inside that it's not interpreted that way? Or does it have to do with the spaces? A1111's documentation doesn't make it clear if you can surround space-separated words in the various syntaxes for emphasis.

Since the square brackets contain 1 or more AND_SALT keywords, it is interpreted as grouping. If you want it to be emphasis for a, b and c, you simply use [a] AND_SALT [b] AND_SALT [c] AND_PERP d :-1 for example. Does this help?

As an aside, I do have a question not related to Neutral Prompts at all, I hope you'll forgive me for asking: I've heard mixed things about how negative prompts work, some say that spaces don't matter and each word is interpreted individually, some day that you can just replace the space with an underscore or even escape the whitespace with a backslash for the same effect. Others say that the way negative prompts work changed at some point and now it's just comma separated. How does it really work, and is there any special statement normally used in positive prompts that can be used in negative prompts for some reason (maybe even ones from certain extensions)?

AFAIK, the negative prompt is not tempered with in any special way. People will use all sorts of ways to separate words for many reasons, but really you should think of the negative prompt box as describing precisely the image you do NOT want to generate. It goes exactly through the same process as each positive prompt for denoising. The only difference is how it is used in the CFG equation and by neutral prompt to send the individual latent maps to delta space.

In practice, the text in the negative text box is literally sent to the text encoder for tokenizing and then the tokens are further passed to the transformer model, and then considering clip skip, a bunch of encoded vectors are used to condition the model cross-attention layers.

Something to think of when crafting a negative prompt is that each comma or underscore or whatever other token will take one more token than if you just used whitespace, which you may or may not want. Personally, I like to use words instead of commas, if I really want to separate words conceptually, for the simple reason that it makes the prompt more varied. Since each comma takes a token, you could instead use words like of or into or under and it will take exactly the same amount of tokens but give the model richer information to work with.

ljleb commented 4 months ago

A1111's documentation doesn't make it clear if you can surround space-separated words in the various syntaxes for emphasis.

missed that bit. All text within square brackets in the original implementation [ ... ] will simply catch all tokens that the text splits into, and then all tokens that came from text inside the square brackets will have their scale decreased. It's the same grouping rules as for attention emphasis ( ... ) in this case, just the weight that is 1/1.1 instead of 1.1. For example, [a cat] will generate the exact same image as [a] [cat] (you can test it by pinning your seed to a fixed value)

ljleb commented 4 months ago

I think part of the confusion stems from how the auxiliary prompts are combined.

To compute AND_SALT, the extension picks the combined AND latent map delta and all AND_SALT latent map deltas within the same prompt group, puts them all in a list, and then combines them according to how the readme describes it. AND_SALT does not see any other type of prompts than AND and AND_SALT, it does not see for example AND_PERP nor AND_TOPK, which are computed separately and then added to the final result for that group.

Something similar goes for all auxiliary prompts, they only see AND prompts and themselves. AND_SALT is special in that it needs to compute all AND_SALT inside each prompt group together, where other auxiliary prompt types like AND_PERP and AND_TOPK only see themselves and the combined AND latent map delta.

You can think of the auxiliary prompts within the same prompt group as being computed separately and then the resulting latent map delta for each type of prompt is added in an arbitrary order with the math operation +, which is commutative. Since the combination of auxiliary latent map deltas is commutative, the order doesn't matter.

You can test this by randomly switching the order in which the prompts appear:

a
AND_PERP b
AND_PERP c
AND_SALT d
AND_SALT e

will give you exactly the same image as

a
AND_SALT e
AND_PERP c
AND_PERP b
AND_SALT d
God-damnit-all commented 4 months ago

Apologies for the late reply, been rather busy. Thank you for the additional context. The most recent reply you gave is particularly insightful and will help me prompt going forward.

I think the last thing I'm truly confused on is the BREAK keyword. I don't even know where to begin asking about it. I get that it's meant to separate different concepts or something. Do neutral prompts handle BREAK in any sort of special way? What happens if BREAK occurs inside of a prompt grouping made with square brackets?

ljleb commented 4 months ago

The webui by default splits a prompt into chunks of 75 tokens. Each chunk is sent to the text encoder to be encoded separately. Since different chunks are not encoded together, you can typically use them to represent different concepts where each is self-contained.

BREAK makes the cut happen sooner than 75 tokens, allowing you to encode chunks with less text in each.

Do neutral prompts handle BREAK in any sort of special way?

No, it doesn't. BREAK is a purely text encoder thing and neutral prompt syntax only has an effect on the corresponding latent maps.

What happens if BREAK occurs inside of a prompt grouping made with square brackets?

The prompt is parsed and split into chunks of tokens as the webui would normally do it without the extension.

Let me know if I can provide anything else to help with the syntax!