v0xie / sd-webui-incantations

Enhance Stable Diffusion image quality, prompt following, and more through multiple implementations of novel algorithms for Automatic1111 WebUI.
GNU General Public License v3.0
120 stars 7 forks source link

better interface for T2I-Zero? #58

Open drhead opened 1 week ago

drhead commented 1 week ago

From the small amount of experimentation I've done with T2I-Zero, it seems like it has a fair amount of potential, but right now it's really hard to use since I have to manually count tokens in the prompt and fix them every time I change the prompt.

I would think that the ideal method of control would be something that parses some part of the prompt as a marker like how sd-dynamic-prompts works (and this wouldn't be nearly as complex as that since this either has it applied to a token or not applied). There's possibly alternative ways to do it that might involve custom UI elements which may or may not be easier to implement (stable-diffusion-webui-tokenizer would be where I'd start for that).

edit: also somewhat related, I don't think there's any distinction between the positive and negative prompt, so t2i0 seems to always be applied to the negative prompt. This will also cause a device-side assert if the positive prompt is longer than the negative prompt and padding for the negative prompt is not enabled (this really needs to be replaced with cross attention masking on A1111's side so this isn't an issue, a lot of extensions have problems with this). Regardless of whether it crashes this does need some way of filtering the negative prompt out which may itself require forced unbatching of cond/uncond.