vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.31k stars 379 forks source link

[Feature]: Prompt downweighting in original backend #2441

Open ryukra opened 8 months ago

ryukra commented 8 months ago

Issue Description

Downweighting is not working as "intended". Weighting a tag with a weight of under 1.0 (tag:0.5) doesnt work as someone would think it does. Downweighted tags still influence the rest of the prompt. Fixing this would improve a ton of prompts because you have more fine control.

A comfyui custom node and invokeai do the downweighting by creating embeddings of every tag and then weight it. Explanation from the custom node: https://github.com/BlenderNeko/ComfyUI_ADV_CLIP_emb image

The functions are here: https://github.com/BlenderNeko/ComfyUI_ADV_CLIP_emb/blob/master/adv_encode.py

The best algorithm is comfy++:

 if weight_interpretation == "comfy++":
        weighted_emb, tokens_down, _ = down_weight(unweighted_tokens, weights, word_ids, base_emb, length, encode_func)
        weights = [[w if w > 1.0 else 1.0 for w in x] for x in weights]
        #unweighted_tokens = [[(t,1.0) for t, _,_ in x] for x in tokens_down]
        embs, pooled = from_masked(unweighted_tokens, weights, word_ids, base_emb, length, encode_func)
        weighted_emb += embs

I dont know what the masking is. Probably so that the embeddings don't influence other tags so its weighted correctly? Like blue hair if you weight blue, it still keeps the blue because blue hair is still there?

Version Platform Description

No response

Relevant log output

No response

Backend

Original

Model

SD 1.5

Acknowledgements

brknsoul commented 8 months ago

SD.Next has an option in Settings "Prompt attention mean normalization" which might be altering your weights.

ryukra commented 8 months ago

No that doesn't change it.

vladmandic commented 8 months ago

you've said this is on backend:original, how is it on backend:diffusers? prompt weighting there is done completely differently.

ryukra commented 8 months ago

true, the diffusers backend works as intended! is it hard to add it into original?

vladmandic commented 8 months ago

quite difficult. and not really motivated to do such major work on original backend anything i touch there causes near-riot since "results-are-not-identical-to-what-they-used-to-be". so i would have to leave new method as disabled by default which means it would only be enabled & used by handful of users - which then doesn't justify the time invested to do it in the first place. i'm just remembering when i fixed prompt parser (a1111 parser makes a mess in some corner cases) and it caused a riot since images are not exactly-identical to before.

ryukra commented 8 months ago

most users dont know that downweighting isn't working as they think, thats why its only really used by handful of users, everyone I tell this is shocked that it basically doesn't really work, only for some tags. diffusers is lacking some cool features sadly. you can keep the issue open, maybe someone wants to implement this, or close it as its kinda already there.

vladmandic commented 8 months ago

diffusers is lacking some cool features sadly.

which ones? i'd rather spend energy getting that side to be fully on-par.

ryukra commented 8 months ago

dunno if thats less work than the downweighting.. controlnet extension, but more specially the IP adapter where you can use images as prompts, the implementation in the controlnet extension isn't really that good, how fooocus does it is way better you can easily use up to 4 images as prompt and it enables ton of hard concepts, but of course its using a different base. I dont know any alternative for that. diffusers also seems to be missing samplers, DPM++ 2M SDE for example I like it and foocus also uses the "gpu" version as base sampler, the DPM SDE seems to be broken right now because the seed doesn't do anything

vladmandic commented 8 months ago

i'm already working on native controlnet implementation, this is the prototype (its not ready for integration with sdnext yet, its still missing few items): https://github.com/vladmandic/control

Holyniwa commented 8 months ago

"and not really motivated to do such major work on original backend" Does this mean that unless some massive program-breaking bug pops up, original backend has effectively reached its final version?

"i'd rather spend energy getting that side to be fully on-par." not sure it was explicitly mentioned before, but sounds like XL is the priority now.

i would like to upgrade, since im on the Oct 12th version, but when i tried upgrading to a recent version, a number of features seemed to break (even after trying to update their extensions). i cant remember them all, i remember Civithelper tab not showing up and Lycoris weight blocks not working at all. not sure if any meaningful updates were added to backend during this time, or if its just been XL focused since then.

vladmandic commented 8 months ago

Does this mean that unless some massive program-breaking bug pops up, original backend has effectively reached its final version?

a bit more than that. new features are being added to support extensions as exension ecosystem is massive on that side.

not sure it was explicitly mentioned before, but sounds like XL is the priority now.

far from it. diffusers backend supports sd, sdxl and aboud 12 other model types.

i remember Civithelper tab not showing up

pretty much all important civithelper functionality is built-in, i don't see a point of that extension?

Lycoris weight blocks not working at all. not sure

lora is now meta-network and it decides when to use lora/lyco/ia-3/locon/etc., - there are too many network types to have separate namespace for each.

if any meaningful updates were added to backend during this time, or if its just been XL focused since then.

its all documented in the changelog.

vladmandic commented 7 months ago

converted to feature request as this was never present in a1111 or original backend.