seruva19 / kubin

Web-GUI for Kandinsky text-to-image diffusion models.
175 stars 17 forks source link

Strange result when using negative prompts #99

Closed Superbelko closed 1 year ago

Superbelko commented 1 year ago

Not sure what's happening, but when using a negative prompt the image got strange cyan overlay tint like, umm, anime style gradient background or something? And it is even more noticeable in photos and portraits, and more pronounced with increased CFG guidance scale like 7.0. DDIM sampler has a bit less effect but it is still noticeable.

Example prompt and negative (same seed) and p-sampler, even though 3d/cg is somewhat conflicting with this example IIRC it is not the problem and it happens in highly stylized prompts too.

It is kind of less buggy with this simple negative test, washed out colors, oversaturated - and especially on lower CFG like 3.0, but it is clearly got brighter, I dunno maybe it is supposed to be that way?

prompt beautiful sphere, glossy skin, masterpiece, concept art, acute angle, hdr, sharp focus, forest background

negative lowres, ((bad anatomy)), bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, black and white, red eyes, big eyes, long neck, picture_frame, cartoon, ((disfigured)), ((bad art)), ((deformed)), ((poorly drawn)), ((extra limbs)), ((b&w)), weird colors, blurry, ((ugly_face)), cg, 3d , 3d render

20230625101303_beautiful_sphere,_glossy_skin,_masterpiece,

20230625101632_beautiful_sphere,_glossy_skin,_masterpiece,

seruva19 commented 1 year ago

I don't have an explanation for this, just some thoughts:

  1. There is CLIP's 77-token limit for negative prompts, so some words are not used for generating embeddings. As a result, the negative prompt effectively becomes just "lowres, ((bad anatomy)), bad hands, text, error, missing fingers, extra digit"
  2. There is no prompt weighting, so these extra parentheses are treated as noise (perhaps?).
  3. From my observations, the typical "tag-like" style of prompt that we got used to in SD does not work properly with Kandinsky. It expects more traditional, "narrative" prompts. And I noticed that words that come first in a prompt (before the first comma) have much STRONGER priority over consequent words than in SD

So, to actually find out whether a higher CFG has such a "definitive" effect on image output, I would experiment with simpler prompts. Otherwise, it is likely that the complexity of the prompt might be more impactful than CFG changes.

Overall, I think someone with deep knowledge of PyTorch could provide a comprehensive explanation of this behavior by step-by-step decomposition of the source scripts. Unfortunately, I cannot claim to be that person right now :)

seruva19 commented 1 year ago

In 2.2 this effect is not observed anymore.

Screenshot ![image](https://github.com/seruva19/kubin/assets/26826215/975d5950-550b-472b-a24e-7266c7896f72)