Severe difference with A1111

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

https://huggingface.co/docs/diffusers

Apache License 2.0

26.1k stars 5.38k forks source link

Severe difference with A1111 #7850

Closed alexblattner closed 6 months ago

alexblattner commented 6 months ago

Describe the bug

I am using a custom model. On A1111 it is far more colorful than on diffusers. I am aware that it's impossible to replicate images between the 2 with the same input, but my observation is across many examples

here's results for diffusers with 1 prompt accross guidance scale:

same but for A1111:

I know what you may say. It's unscientific and all, but this is my experience accross multiple images, with controlnet and ip adapter and without. On A1111, it's consistently closer to the style while diffusers has less color and tries to be more realistic (also burns more frequently)

Reproduction

I can't give you a code snippet. It's just basic comparison with A1111 results for heavily stylized models.

Logs

No response

System Info

python

Who can help?

@DN6 @yiyixuxu sorry for the very vague issue, wish I could do better

asomoza commented 6 months ago

Hi, maybe you can't give code, but maybe the prompt, model and parameters? I can generate a lot of images but I won't know the difference with what you're doing.

Also I don't get your comparison, the diffusers example is a portrait of a man and the auto1111 is a woman with a portrait and half body mix, so you're not even using the same prompt? To be able to compare them, at least, you should fix all the parameters of the generation, even if they won't generate the same image.

Just using a low res image of what you generated with IP Adapters, I can get the saturation and style without problems.

Probably can do better if I had the prompt and the style you're using.

alexblattner commented 6 months ago

@asomoza you are correct. here's the model: https://drive.google.com/file/d/10GCQNP13YIuw8dX8zyAztUY-_lq5JH8m/view?usp=drive_link

prompt: "1boy, brown hair, waltz with bashir style, archer style" negative_prompt: "(worst quality, low quality),childlike, petite, loli," steps: 30 guidance_scale: 7.5 ip_scale: 1 ip_s_scale: 1 ip adapter: ip-adapter-faceid-plusv2_sd15.bin ip_image: Screenshot_2024-05-02_150906

the model is in diffusers so from_pretrained will work on it. I don't have it in a format for A1111 at the moment, but I doubt you would want to download the same model twice for that

asomoza commented 6 months ago

I'm kind of curious on how you tested the model with auto1111 if you don't have a compatible version, but anyways, I had my suspicion about it, most of the time you get those kind of images with SD 1.5 it's the vae.

So I just switched the vae and it worked, didn't even have to test with IP Adapters.

vae = AutoencoderKL.from_single_file(
    "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors",
    torch_dtype=torch.float16,
).to("cuda")

pipe = StableDiffusionPipeline.from_pretrained("./models/poselabsv12", torch_dtype=torch.float16, vae=vae).to("cuda")

original	switched vae

So I recommend you switch your vae for a good one in the popular models, I tested it with this one which is not in diffusers format because I was testing another SD 1.5 pipeline.

alexblattner commented 6 months ago

I'll try that out. I'm working with someone using A1111 which is why I'm in my situation

alexblattner commented 6 months ago

it was the issue, thank you!

alexblattner commented 4 months ago

the real issue was that the strength of faceid is higher in diffusers than A1111, apply the lora at -.5 to fix it