vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.52k stars 402 forks source link

[Feature]: Escaping colon in prompt #2754

Open Iluvalar opened 8 months ago

Iluvalar commented 8 months ago

Issue Description

As previously stated in this previous issue: https://github.com/vladmandic/automatic/issues/1071 Escaping a colon, do not escape the colon.

For exemple if I prompt " (blue\:3)"

one would expect " ['blue:3', 1.1]" not " ['blue\', 3.3]"

It would always be possible to prompt "(blue\:3)" if the later was ever desired.

In other words, I believe we'd need a " elif text == ':'" on line 200 and the relevant code on 201 of

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/c1c27dad3ba371a5ae344b267c760aa51e77f193/modules/prompt_parser.py

Version Platform Description

No response

Relevant log output

No response

Backend

Original

Branch

Master

Model

SD 1.5

Acknowledgements

brknsoul commented 8 months ago

Use case, make someone happier, for 5 steps; [\:D::5] or after 5 steps; [:\:D:5]

vladmandic commented 8 months ago

For exemple if I prompt " (blue:3)" one would expect " ['blue:3', 1.1]" not " ['blue', 3.3]"

i would expect that it parses to ['blue': 3.0] since 3 is just int version of float 3.0. what is the use-case of expanding it to ['blue:3', 1.1]?

in either case, this is not an issue - if you want to change the behavior, that would be a feature request and i'm reluctant to ok it without a strong use-case.

Iluvalar commented 7 months ago

"dex:15" <- I'm generating a D&D character "manacost:3" <- I'm making a magic the gathering card "0:0" <- I want that on a sign. Just because I want that on a sign. "age:23" <- why not? "quantity:5" "codename:007" "page:6" ":3" <- known emoticon "pi:3.1416"

I do realize now that I'm experimenting with it, that I can hack around by adding a random comma after the prompt and at least get an image. But it's not the same as being able to prompt it. I'm curious to prompt "codename:007" not ""codename:007 ." or "codename:007," or "codename:007 \" These all give different James Bond with the same seed.

Because their are hack around, it's not the strongest of use case. However, I still think one would expect to be able to escape the ":" character. It shouldn't be too hard to implement and it will only affect a minimal amount of prompt. beside "/!\:1.4" I can't really imagine what other prompt this would affect. And one would only need to prompt "/!\\:1.4" to get the desired result.

Edit: Git escaped my "\:" characters in the message.

vladmandic commented 7 months ago

my concern is that i'd be changing normal/documented/expected behavior to fit an edge-case where behavior is anything but documented. why? lets skip weights and just look at raw prompt quantity:5 tokenizer and text-encoder will happy do something, but what is that something? does clip actually define what is the meaning of such prompt?

only way to tell (that i can think of) is to run raw prompt via tokenizer and encoder, get tokens and then do reverse lookups for those tokens and compare with different variations - for example, what is the meaning of quantity:5 vs quantity,5 or just quantity 5 or even quantity5 without space (since tokenizer breaks words anyhow)? that's a valid and interesting thing to analyze, but i cannot afford the time right now.

so if you want this feature, then you'd have to explain what is that quantity:5 exactly does. not visually, but actual behavior. modifying prompt parser to fit edge case of undocumented behavior would be less than ideal.

Iluvalar commented 7 months ago

It's already documented. It is in the documentation that people should use "\" to signify the backslash.

I certainly understand that your time is limited. And this is obviously not be a life or death matter. I'm just curious to explore the usage of a new character and find myself limited by the lack of escape character for it. But I will survive it's pure curiosity.

vladmandic commented 7 months ago

i mean what is documented behavior of quanity:5 in clip tokenizer/text-encoder?

Iluvalar commented 7 months ago

Yes, I understand your question. I was trying to dodge it. :D

As I use the token inspector in automatic1111. It seems that clip have plenty of tokens that contain ":". It seems to me that it interpret it just like any other characters.

However, the way you ask this, make me fear there is something deeper of sens that I'm missing in your question... I don't see why clip would treat ":" as a special character or how. But I have no quick way to test it. As I've been spoiled by automatic from most of my image generating experience.

Similar tokens: :-)(4223) ;-)(10475) :-((25137) :)(1408) 😃(8520) :-(13021) 😀(7334) 😊(3020) :))(10164) ;)(3661) 🙂(14860) :)))(21840) 😃(14079) :-(10323) 😄(7624) :((5965) 😊😊😊(28685) 😀(12832) .....(3104) :')(10701) ....(1390) =)(17840) :/(13604) 😊(4692) 😁(4821) ☺(8703) 😋(7203) ))(5167) 😊😊(20842) lol(1824)

vladmandic commented 7 months ago

so all the known instances of : in clip are emojis - not surprising. and that's my point - if we do escaping as you suggest and pass quantiity:5 to clip as-is, what does that even mean to clip? its not recognized so tokenizer will do some word-breaking and tokenize each section separately.

so how then quantity:5 is different than quantity5 or quantity 5 perhaps clip would use different break between them, but none of them are really documented behavior. so why would writing quantity:5 be speciually supported in sdnext when its not clip itself?

Iluvalar commented 7 months ago

I'm sorry, I gave you a list of tokens most similar to ":-)". Here is ":"

Similar tokens: :(281) .(269) ,(267) !(256) ;(282) #(258) -(268) ?(286) "(257) ((263) @(287) ":(7811) ...(678) =(284) ):(4143) ':(7182) !:(17545) :"(12089) ."(1081) :(25) for(556) of(539) !!(748) at(536) 's(568) :-(10323) and(537) is(533) to(531) !!!(995)

It have the same magnitude as other characters like "a" , "b" or "!"

Those AI learned to speak English via training. ":" itself is a token. I'm not sure why you expect it to behave differently?

I don't understand why since i can use the character "🧛‍♀️" to make female vampire. and "🧛🏻‍♀️" to make one with a slightly paler skin. Because clip & SD learned the skin modifier character I would doubt that it learned something regarding a common character like ":".

Iluvalar commented 7 months ago

Good news! I found a viable work around! I made myself an empty embedding. (0.0001% of "," to be specific because pure empty didn't save)

This way i can make all the blue cute kitties i want with the prompt: "(:3 emb_empty:1.5) blue" . I guess it also prove that the characters combination yield something unique. 00141-1700036486(_3 emb_empty_1 5) blue

00142-3957597353(_3 emb_empty_1 5) blue

This work because the code I shown in the opening message test to see if all the part right of the ":" character is numerical. So by adding an empty embedding I manage to do everything I'd want from this.