vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.71k stars 425 forks source link

[Issue]: Extremely poor quality of generations through diffusers backend #2833

Closed bkosowski closed 9 months ago

bkosowski commented 9 months ago

Issue Description

Images generated while on diffusers backend are of considerably worse quality, even when trying to match settings as much as possible. This is on SD 1.5 based models.

The two images speak for themselves. Here is an image generated through the original backend: 00005-CyberRealistic_Classic_v17-A woman standing on a side walk 40 And here is an image generated through the diffusers backend: 00003-CyberRealistic_Classic_v17-A woman standing on a side walk 40

The quality loss is extreme!

Generation information:

- Model: CyberRealisticClassic v1.7
- Positive prompt:
A woman standing on a side walk, 40 years old, wearing a business suit, wearing glasses,
complex background, messy, detailed background, raining, cloudy day,
masterpiece, photorealistic, realistic, hyperrealistic, hyperdetailed, ultrarealistic, ultra highres, 4k, HDR, high detail, high quality, studio photo, professional photography, extra details, very detailed, intricate fine detail
- Negative prompt
 BeyondNegative_v4-neg, CyberRealistic_Negative,
(portrait:1.4), (closeup:1.4),
censored, deformed, grotesque, amputated, disfigured, mutilated, bad anatomy, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, huge calf, bad hands, fused hand, fused legs, missing hand, disappearing arms, disappearing thigh, disappearing calf, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, abnormal hands, abnormal legs, abnormal feet, long feet, big feet, elongated feet, abnormal fingers, duplicated, mirrored, ugly, obese, fat,
(monochrome, grayscale:1.3), b&w, black and white, oversaturated, sepia,
(worst quality:2), (low quality:2), (normal quality:2), lowres, low resolution, jpeg artifacts, cropped, out of frame, canvas frame, border, frame, picture frame, (haze:1.2), (blur:1.2), (blurry:1.2), (unfocused:1.2), (depth of field:1.3), text, error, username, signature, watermark, logo,
rendered, 3D render, Octane render, Cinema 4D, Blender, Unreal, Unreal Engine, 3ds Max, Maya, Milkshape 3D, Unity, CG, CGI, computer graphics, computer generated image, computer animation, video game, trending on CGSociety, drawing, cartoon, painting, illustration, anime, sketch,
- First pass:
  - Sampler: DPM++ 2M karras
  - Steps: 35
  - CFG: 7
  - Seed: 1760320592

- Second pass:
  - Sampler: Euler a
  - Denoising: 0.5
  - Upscaler: Latent Nearest
  - Hires steps: 20
  - Upscale by: 2

The exact same thing happens after using --reinstall in the command line, waiting for everything to reinstall, and restarting the server.

Am I don't something wrong? Is there some secret sauce that I'm missing?

Version Platform Description

Beginning of the log:

12:32:46-836498 INFO     Starting SD.Next
12:32:46-838499 INFO     Logger: file="D:\AIArt\automatic\sdnext.log" level=DEBUG size=65 mode=create
12:32:46-839498 INFO     Python 3.11.6 on Windows
12:32:46-935590 INFO     Version: app=sd.next updated=2024-02-09 hash=66ac9b20 url=https://github.com/vladmandic/automatic.git/tree/master
12:32:47-212376 INFO     Platform: arch=AMD64 cpu=AMD64 Family 25 Model 97 Stepping 2, AuthenticAMD system=Windows release=Windows-10-10.0.22631-SP0
                         python=3.11.6
12:32:47-216381 DEBUG    Setting environment tuning
12:32:47-216381 DEBUG    HF cache folder: C:\Users\gamer\.cache\huggingface\hub
12:32:47-216381 DEBUG    Torch overrides: cuda=True rocm=False ipex=False diml=False openvino=False
12:32:47-220381 DEBUG    Torch allowed: cuda=True rocm=False ipex=False diml=False openvino=False
12:32:47-220381 INFO     nVidia CUDA toolkit detected: nvidia-smi present

GPU:

device: NVIDIA GeForce RTX 4080 (1) (sm_90) (8, 9)
cuda: 12.1
cudnn: 8801
driver: 551.23

Torch:

2.2.0+cu121 Autocast  half

Libs:

xformers: 
diffusers: 0.26.2
transformers: 4.37.2

Device info:

active: cuda
dtype: torch.float16
vae: torch.float16
unet: torch.float16

Cross attention:

Scaled-Dot-Product

System-info tab: SDNext-system-info

Launch command line args for the original backend: --upgrade --use-cuda --models-dir "D:\AIArt\Models" --backend original --config config.json --ui-config ui-config.json --debug

Launch command line args for the diffusers backend: --upgrade --use-cuda --models-dir "D:\AIArt\Models" --backend diffusers --config config.json --ui-config ui-config.json --debug

Relevant log output

sdnext.log

Backend

Diffusers

Branch

Master

Model

SD 1.5

Acknowledgements

Disty0 commented 9 months ago

Prompt weighting on Diffusers backend is actually functional unlike Original backend and you are using extreme ranges of prompt weights.

r7vz9h3 commented 9 months ago

Try to re-generate in both backends with no negative prompt - these 1000-token long "super universal negative prompts" likely do more harm than good anyway.

bkosowski commented 9 months ago

Changing the prompts to use no custom weights:

- Positive prompt:
A woman standing on a side walk, 40 years old, wearing a business suit, wearing glasses,
complex background, messy, detailed background, raining, cloudy day,
masterpiece, photorealistic, realistic, hyperrealistic, hyperdetailed, ultrarealistic, ultra highres, 4k, HDR, high detail, high quality, studio photo, professional photography, extra details, very detailed, intricate fine detail

- Negative prompt:
BeyondNegative_v4-neg, CyberRealistic_Negative,
portrait, closeup,
censored, deformed, grotesque, amputated, disfigured, mutilated, bad anatomy, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, huge calf, bad hands, fused hand, fused legs, missing hand, disappearing arms, disappearing thigh, disappearing calf, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, abnormal hands, abnormal legs, abnormal feet, long feet, big feet, elongated feet, abnormal fingers, duplicated, mirrored, ugly, obese, fat,
monochrome, grayscale, b&w, black and white, oversaturated, sepia,
worst quality, low quality, normal quality, lowres, low resolution, jpeg artifacts, cropped, out of frame, canvas frame, border, frame, picture frame, haze, blur, blurry, unfocused, depth of field, text, error, username, signature, watermark, logo,
rendered, 3D render, Octane render, Cinema 4D, Blender, Unreal, Unreal Engine, 3ds Max, Maya, Milkshape 3D, Unity, CG, CGI, computer graphics, computer generated image, computer animation, video game, trending on CGSociety, drawing, cartoon, painting, illustration, anime, sketch,

Also produces a broken image (when compared to the original backend): 00010-CyberRealistic_Classic_v17-A woman standing on a side walk 40

Disty0 commented 9 months ago

Try with just this on both backends:

Positive:

A woman standing on a side walk, 40 years old, wearing a business suit, wearing glasses,
complex background, messy, detailed background, raining, cloudy day,

Negative:

cartoon, painting, illustration, worst quality, low quality, normal quality

And i am assuming you are using DPM++ 2M with karras checkbox turned on.

bkosowski commented 9 months ago

Using only the positive prompt leads to an image than is of much worse quality than the image generated through the original backend with the negative prompt: 00011-CyberRealistic_Classic_v17-A woman standing on a side walk 40

Can't the diffusers backend really handle negative prompt?

Disty0 commented 9 months ago

Can't the diffusers backend really handle negative prompt?

No, it is doing exactly what you told it to do.

bkosowski commented 9 months ago

Adding the simple negative prompt:

cartoon, painting, illustration, worst quality, low quality, normal quality

leads to an image than is also of much worse quality than the one generated through the original backend with a long negative prompt (but of course, much better than the image generated through the diffusers backend with the long negative prompt): 00012-CyberRealistic_Classic_v17-A woman standing on a side walk 40

Yes, it's DPM++ 2M karras.

I understand that one can generate an image using different prompts. But the issue here is that the backend handles the prompts in a completely different way (worse, in my opinion). Is there some doc that explains the differences?

Disty0 commented 9 months ago

Difference is, it actually works as intended. So choose the things you don't want in the image on the negative and don't throw a soup of random words that doesn't make any sense.

Yes, it's DPM++ 2M karras.

Does it look like this? It should look like this: If not, refresh your page after changing backends. image

vladmandic commented 9 months ago

thanks @Disty0 - i agree, this may not be what @bkosowski is used to, but goal here is not to be 100% like A1111. prompt parser -> tokenizer -> text encoder are doing their job as intended. and its actually much closer to how Comfy or Invoke handle prompts.