Closed alexblattner closed 6 months ago
Hi, maybe you can't give code, but maybe the prompt, model and parameters? I can generate a lot of images but I won't know the difference with what you're doing.
Also I don't get your comparison, the diffusers example is a portrait of a man and the auto1111 is a woman with a portrait and half body mix, so you're not even using the same prompt? To be able to compare them, at least, you should fix all the parameters of the generation, even if they won't generate the same image.
Just using a low res image of what you generated with IP Adapters, I can get the saturation and style without problems.
Probably can do better if I had the prompt and the style you're using.
@asomoza you are correct. here's the model: https://drive.google.com/file/d/10GCQNP13YIuw8dX8zyAztUY-_lq5JH8m/view?usp=drive_link
prompt: "1boy, brown hair, waltz with bashir style, archer style" negative_prompt: "(worst quality, low quality),childlike, petite, loli," steps: 30 guidance_scale: 7.5 ip_scale: 1 ip_s_scale: 1 ip adapter: ip-adapter-faceid-plusv2_sd15.bin ip_image:
the model is in diffusers so from_pretrained will work on it. I don't have it in a format for A1111 at the moment, but I doubt you would want to download the same model twice for that
I'm kind of curious on how you tested the model with auto1111 if you don't have a compatible version, but anyways, I had my suspicion about it, most of the time you get those kind of images with SD 1.5 it's the vae.
So I just switched the vae and it worked, didn't even have to test with IP Adapters.
vae = AutoencoderKL.from_single_file(
"https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors",
torch_dtype=torch.float16,
).to("cuda")
pipe = StableDiffusionPipeline.from_pretrained("./models/poselabsv12", torch_dtype=torch.float16, vae=vae).to("cuda")
original | switched vae |
---|---|
So I recommend you switch your vae
for a good one in the popular models, I tested it with this one which is not in diffusers format because I was testing another SD 1.5 pipeline.
I'll try that out. I'm working with someone using A1111 which is why I'm in my situation
it was the issue, thank you!
the real issue was that the strength of faceid is higher in diffusers than A1111, apply the lora at -.5 to fix it
Describe the bug
I am using a custom model. On A1111 it is far more colorful than on diffusers. I am aware that it's impossible to replicate images between the 2 with the same input, but my observation is across many examples
here's results for diffusers with 1 prompt accross guidance scale:
same but for A1111:
I know what you may say. It's unscientific and all, but this is my experience accross multiple images, with controlnet and ip adapter and without. On A1111, it's consistently closer to the style while diffusers has less color and tries to be more realistic (also burns more frequently)
Reproduction
I can't give you a code snippet. It's just basic comparison with A1111 results for heavily stylized models.
Logs
No response
System Info
python
Who can help?
@DN6 @yiyixuxu sorry for the very vague issue, wish I could do better