vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.31k stars 379 forks source link

[Feature]: add "Alternating words" like [cow|horse] to diffusers prompt parser #2712

Open r7vz9h3 opened 6 months ago

r7vz9h3 commented 6 months ago

Issue Description

Expected behavior:

If SDNext intends to keep compatibility with this A1111 feature, I would expect:

  1. Alternating words like [cow|horse] should change the prompt between each denoising step, alternating between the two.
  2. Prompt editing like [from:to:when] should replace from with to after step when if it's integer, or after when fraction of steps if it's between 0 and 1.

But SDNext's Wiki contradicts it in the "Advanced Prompt Modifiers" section:

Alternate between words: "[word1|word2]" if batch is 2, it will generate one image using word1 and one image using word2

Observed behavior:

Neither of these syntaxes do anything, even in safe mode with no extensions running.

Examples:

Steps: 30, Seed: 617120954, Sampler: Euler, CFG scale: 5, Size: 512x512, Parser: Full parser,
Model: v1-5-pruned-emaonly, Model hash: 6ce0161689, Backend: Diffusers, App: SD.Next,
Version: ab7b78c, Operations: txt2img, Pipeline: StableDiffusionPipeline

painting of a bowl of oranges, still life

painting of a bowl of apples, still life

painting of a bowl of [apples|oranges], still life

Should be a blend of oranges and apples.

painting of a bowl of [apples:oranges:0.3], still life

Should be a blend of oranges and apples - or apple-shaped oranges.

painting of a bowl of [apples|oranges], still life, batch 2

If SDNext wiki is right, the 2nd image should show oranges.

Version Platform Description

12:19:15-006851 INFO     Starting SD.Next
12:19:15-009855 INFO     Logger: file="E:\ai\am\sdnext.log" level=INFO size=1439669 mode=append
12:19:15-010855 INFO     Python 3.10.11 on Windows
12:19:15-163348 INFO     Version: app=sd.next updated=2023-12-30 hash=ab7b78cc
                         url=https://github.com/vladmandic/automatic/tree/master
12:19:15-394140 INFO     Platform: arch=AMD64 cpu=Intel64 Family 6 Model 165 Stepping 5, GenuineIntel system=Windows
                         release=Windows-10-10.0.22621-SP0 python=3.10.11
12:19:15-400647 INFO     nVidia CUDA toolkit detected: nvidia-smi present
12:19:15-455473 WARNING  Modified files: ['repositories/BLIP/BLIP.gif', 'repositories/CodeFormer/.gitignore']
12:19:15-502838 INFO     Extensions: disabled=['sd-webui-controlnet', 'stable-diffusion-webui-images-browser',
                         'multidiffusion-upscaler-for-automatic1111', 'sd-webui-sendtonegative']
12:19:15-504839 INFO     Extensions: enabled=['clip-interrogator-ext', 'Lora', 'sd-extension-chainner',
                         'sd-extension-system-info', 'sd-webui-agent-scheduler', 'stable-diffusion-webui-rembg']
                         extensions-builtin
12:19:15-509345 INFO     Startup: quick launch
12:19:15-510344 INFO     Verifying requirements
12:19:15-515856 INFO     Verifying packages
12:19:15-517359 INFO     Extensions: disabled=['sd-webui-controlnet', 'stable-diffusion-webui-images-browser',
                         'multidiffusion-upscaler-for-automatic1111', 'sd-webui-sendtonegative']
12:19:15-518362 INFO     Extensions: enabled=['clip-interrogator-ext', 'Lora', 'sd-extension-chainner',
                         'sd-extension-system-info', 'sd-webui-agent-scheduler', 'stable-diffusion-webui-rembg']
                         extensions-builtin
12:19:15-523362 INFO     Running in safe mode without user extensions
12:19:15-530359 INFO     Extension preload: {'extensions-builtin': 0.0}
12:19:15-532358 INFO     Command line args: ['--autolaunch', '--safe'] autolaunch=True safe=True
12:19:20-612468 INFO     Load packages: torch=2.1.0+cu121 diffusers=0.25.0 gradio=3.43.2
12:19:21-300955 INFO     Engine: backend=Backend.DIFFUSERS compute=cuda mode=no_grad device=cuda
                         cross-optimization="Scaled-Dot-Product"
12:19:21-361802 INFO     Device: device=NVIDIA GeForce RTX 4090 n=1 arch=sm_90 cap=(8, 9) cuda=12.1 cudnn=8801 driver=546.01

Relevant log output

No response

Backend

Diffusers

Branch

Master

Model

SD 1.5

Acknowledgements

r7vz9h3 commented 5 months ago

Original backend gives correct results, but I think the wiki is wrong on the intended behavior of [a|b] with batches.

Steps: 30, Seed: 617120954, Sampler: Euler, CFG scale: 5, Size: 512x512, Parser: Full parser,
Model: v1-5-pruned-emaonly, Model hash: 6ce0161689, Backend: Original, App: SD.Next,
Version: ab7b78c, Operations: txt2img, Variation seed: [null], Variation strength: [null],
Sampler brownian: False, Sampler discard: False, Sampler karras: False, Sampler low order: False

painting of a bowl of oranges, still life & painting of a bowl of apples, still life

&

painting of a bowl of [apples|oranges], still life & painting of a bowl of [apples:oranges:0.3], still life

&

vladmandic commented 5 months ago

best if you can propose edits to wiki directly.

r7vz9h3 commented 5 months ago

Since the recent big update, something changed. Now just adding the [a|b] syntax to a prompt breaks the generation, resulting in grey, blurry and distorted images.

painting of a bowl of apples, still life [a|b]

Steps: 30, Seed: 617120954, Sampler: Euler, CFG scale: 5, Size: 512x512, Parser: Full parser,
Model: v1-5-pruned-emaonly, Model hash: 6ce0161689, Backend: Diffusers, App: SD.Next,
Version: 4187692, Operations: txt2img, Sampler options: epsilon, Pipeline: StableDiffusionPipeline

Using painting of a bowl of [apples|oranges], still life gives an almost identical result.

This is running in safe mode with no extensions btw. Was there an attempt at re-implementing this syntax?