Closed brandonwsaw closed 8 months ago
do you mind posting the image of the mask?
do you mind posting the image of the mask?
Sure:
I don't suspect it's related to the mask. This one isn't great, but similar one is used with the A1111 results. And getting the same problem with even very simple masks like the eyes one.
thank you. if you search the github issues you'll find one discussing inpainting in Diffusers vs A1111. there's some postprocessing you have to do, using the mask to actually composite the inpainted area into the original image. i wanted to see the mask so i could be more clear what the end result should be.
thank you. if you search the github issues you'll find one discussing inpainting in Diffusers vs A1111. there's some postprocessing you have to do, using the mask to actually composite the inpainted area into the original image. i wanted to see the mask so i could be more clear what the end result should be.
sorry to clarify, are you saying this is something I can solve myself with some postprocessing of the mask beforehand? I'm not sure I found the right issue you're referencing, do you mean this one? https://github.com/huggingface/diffusers/issues/5808
yes, currently it's done via post.
https://github.com/huggingface/diffusers/issues/4782 https://github.com/huggingface/diffusers/issues/3880
https://github.com/huggingface/diffusers/pull/4536 might actually be what you need.
Thanks, will play around with this, but this issue seems different to me - I'm seeing very different inpainting behavior within the mask than I get from A111, not issues outside the mask. (Although, I actually have noticed that in some other projects so this is good to know).
well the DDIM in Diffusers has some issues (#6068 comes to mind mostly) and so you might want to try Euler or even Euler A.
Hi @brandonwsaw. It seems that you used DDIM
in the code but Euler a
in A1111. Also, diffusers
has not supported several A1111 features such as Mask blur
yet.
They are adding mask_blur support. But the inpaint pipeline doesn't work well.
Hi @brandonwsaw
thanks for the issue!
Yeah I think there are lots of differences in settings, most have been summarized by @bghira and @standardAI :
mask_blur: it is just a pre-processing step for the mask; you can use this line to create blurred mask and use it instead
mask_b = mask.filter(ImageFilter.GaussianBlur(0.4))
controlnet_conditioning_scale
are different: 0.5 in diffusers 0.3 in auto1111
schedulers are different
image sizes are different:auto1111 config says the output size is 1024; does this mean an upscaler is applied?
post-processing is different, diffusers do not overlay the output to the original image, and this should be responsible for the difference we see in the unmasked area.
what is "pixel-perfect" in auto1111 setting? what option is it corresponding to in UI?
what is the "masked_content" mode here? Is it "originaL"? if so, if we want to achieve similar in diffusers, you would use a strength value that's slightly lower than 1.0
, e.g. 0.999
. in diffusers, when you pass strength == 1.0
, it will use a random noise as initial latent, which is similar to the "latent_noise" mode in auto1111
Thanks all for your input and help. I had some red herrings in there, my fault - I pasted over A1111 settings from a run that didn't match, but I'm seeing the same behavior even when all settings are identical. Here's an example where settings are identicall. You can see A1111 seems to be a recolor, diffusers has pretty different behavior inside the mask.
Both are: Euler A, 512x512, CFG 7, ControlNet Weight 0.5, Original Latent, Denoising 1, Mask Blur 0
A111:
red hair Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 512x512, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 0, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0.5, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a
Diffusers:
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image
init_image = load_image("image (1).png")
init_image = init_image.resize((512, 512))
generator = torch.Generator(device="cpu").manual_seed(478847657)
mask_image = load_image("hair-mask (1).png")
mask_image = mask_image.resize((512, 512))
def make_inpaint_condition(image, image_mask):
image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0
assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
image[image_mask > 0.5] = -1.0 # set as masked pixel
image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
return image
control_image = make_inpaint_condition(init_image, mask_image)
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
"stablediffusionapi/anything-v5", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.safety_checker = None
pipe.requires_safety_checker = False
# generate images
output_images = pipe(
"red hair",
negative_prompt='',
num_inference_steps=20,
generator=generator,
image=init_image,
mask_image=mask_image,
control_image=control_image,
guidance_scale=7,
controlnet_conditioning_scale=0.5,
strength=0.999,
).images
# Save the images
for i, image in enumerate(output_images):
image.save(f'output{i+3}.png')```
Hi @brandonwsaw
thanks for the issue!
Yeah I think there are lots of differences in settings, most have been summarized by @bghira and @standardAI :
- mask_blur: it is just a pre-processing step for the mask; you can use this line to create blurred mask and use it instead
mask_b = mask.filter(ImageFilter.GaussianBlur(0.4))
controlnet_conditioning_scale
are different: 0.5 in diffusers 0.3 in auto1111- schedulers are different
- image sizes are different:auto1111 config says the output size is 1024; does this mean an upscaler is applied?
- post-processing is different, diffusers do not overlay the output to the original image, and this should be responsible for the difference we see in the unmasked area.
- what is "pixel-perfect" in auto1111 setting? what option is it corresponding to in UI?
- what is the "masked_content" mode here? Is it "originaL"? if so, if we want to achieve similar in diffusers, you would use a strength value that's slightly lower than
1.0
, e.g.0.999
. in diffusers, when you passstrength == 1.0
, it will use a random noise as initial latent, which is similar to the "latent_noise" mode in auto1111
Thanks, interesting to know about mask blur, post processing, and especially the masked content, but I did play with those and they don't seem responsible. I turned off mask blur and used the 0.999 trick in the example above. A1111 also produces a similar result with mask_content set to latent noise.
I'm not exactly sure what Pixel Perfect is, here's the UI, default is False:
@brandonwsaw
Interesting..
thanks a lot for these additional experiments! Can we set controlnet_conditioning_scale = 0
in both to compare? just want to see if the difference coming from the controlnet part or inpaint part
Sure, here's with the control weight at 0:
A1111:
red hair Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 512x512, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 0, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a
Diffusers:
"red hair",
negative_prompt='',
num_inference_steps=20,
generator=generator,
image=init_image,
mask_image=mask_image,
control_image=control_image,
guidance_scale=7,
controlnet_conditioning_scale=0.0,
strength=0.999,
).images
@brandonwsaw thanks! will look into now:)
hi @brandonwsaw There are two things I noticed here:
PIL.Image.resize()
method on both will cause the image and mask to slightly mismatch; In auto1111 you used "crop and resize", which crop the image to 393 x 393 first before resize to 572 x 572make_inpaint_condition
. However, in this particular example, because the image and mask you provided have different sizes, it decided to use the "image" instead of "masked image" as the control_image
; here is an example output from auto1111 when your image and mask have same size:
make_inpaint_condition
function. This script will generate same result as auto1111
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image
from diffusers import EulerAncestralDiscreteScheduler, ControlNetModel, StableDiffusionControlNetInpaintPipeline
init_image = load_image("yiyi_image_girl.png")
generator = torch.Generator(device="cpu").manual_seed(478847657)
mask_image = load_image("yiyi_image_mask_girl.png")
def make_inpaint_condition(image, image_mask): image = np.array(image.convert("RGB")).astype(np.float32) / 255.0 image = np.expand_dims(image, 0).transpose(0, 3, 1, 2) image = torch.from_numpy(image) return image
control_image = make_inpaint_condition(init_image, mask_image)
controlnet = ControlNetModel.from_pretrained( "lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16 ) pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained( "stablediffusionapi/anything-v5", controlnet=controlnet, torch_dtype=torch.float16 )
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload() pipe.safety_checker = None pipe.requires_safety_checker = False
output_images = pipe( "red hair", num_inference_steps=20, generator=generator, image=init_image, mask_image=mask_image, control_image=control_image, guidance_scale=7, controlnet_conditioning_scale=0.5, strength=0.999, ).images
for i, image in enumerate(output_images): image.save(f'test_5_output{i+3}.png')
image
![yiyi_image_girl](https://github.com/huggingface/diffusers/assets/12631849/89cf30a3-11c1-4053-84da-cf665d315642)
mask
![yiyi_image_mask_girl](https://github.com/huggingface/diffusers/assets/12631849/024d56df-05f5-474a-a699-c88ab358b68d)
output
![yiyi_test_5_output3](https://github.com/huggingface/diffusers/assets/12631849/4b1a88b9-1993-4b37-90b0-f80f52643149)
@yiyixuxu ,
How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model. https://github.com/Mikubill/sd-webui-controlnet/discussions/2225
@yiyixuxu ,
How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model. Mikubill/sd-webui-controlnet#2225
Isn't this what you are looking for or did I understand something wrong?
@yiyixuxu , How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model. Mikubill/sd-webui-controlnet#2225
Isn't this what you are looking for or did I understand something wrong?
No. I'm looking for the sdxl version of this model.
@yiyixuxu , How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model. Mikubill/sd-webui-controlnet#2225
Isn't this what you are looking for or did I understand something wrong?
No. I'm looking for the sdxl version of this model.
OK then, sry 😅.
@yiyixuxu thanks for looking into this. I don't think mask size is the issue here - I grabbed a quick screenshot with the snip tool to post here which is why one of them is slightly different dimensions. But the image/mask I used in my script are both 512x512 (below). And in A111, I'm using their native inpaint function to draw on top of the original image, so the image/mask must be identical.
Interesting, I'll give that mask inpaint condition a shot, seems neat. But I do suspect there's something going on with controlnet, I'm getting worse results even outside of hair recoloring. Here's an example of changing the mouth, again results are pretty different. It's harder to see the differences bc it's smaller (that's why I picked the hair example to show), but Diffusers has more artifacts, blurry lines, and generally lower quality.
Don't want to take up more of your time if you don't think there's something underlying here, but after spending a lot of time trying to recreate A111 results with Diffusers across different experiments it feels like the controlnet for Diffusers isn't as effective for inpainting.
A1111
open mouth, talking, laughing Negative prompt: closed mouth Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 1024x1024, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 0, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0.3, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a
Diffusers
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image
init_image = load_image("image.png")
init_image = init_image.resize((1024, 1024))
generator = torch.Generator(device="cpu").manual_seed(478847657)
mask_image = load_image("mouth-mask.png")
mask_image = mask_image.resize((1024, 1024))
def make_inpaint_condition(image, image_mask):
image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0
assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
image[image_mask > 0.5] = -1.0 # set as masked pixel
image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
return image
control_image = make_inpaint_condition(init_image, mask_image)
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
"stablediffusionapi/anything-v5", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.safety_checker = None
pipe.requires_safety_checker = False
# generate images
output_images = pipe(
"open mouth, talking, laughing",
negative_prompt='closed mouth',
num_inference_steps=20,
generator=generator,
image=init_image,
mask_image=mask_image,
control_image=control_image,
guidance_scale=7,
controlnet_conditioning_scale=0.3,
strength=0.999,
).images
# Save the images
for i, image in enumerate(output_images):
image.save(f'output{i+8}.png')```
i think the difference comes down to seeds. although A1111's output has worse image compression artifacts.
the inpainted mouth looks bad there, too. some kind of image ghosting, lips where they don't belong or something?
as opposed to Diffusers...
but i don't think it's "much worse results" with Diffusers. am i missing it? i don't have the best eyes.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hey, I'm running into the same issue, did you guys found a solution to this small quality difference ?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hey folks, I'm getting much worse behavior with Diffusers than A1111 when using ControlNet Inpainting. I'm using the exact same model, seed, inputs, etc. but it's clear the inpainting behavior is very different. Below is one example but I have more if it's helpful. Lots of artifacts from Diffusers, A1111 essentially just recolors. Thanks for all your help, let me know how else I can be helpful.
Original Image
Diffusers Inpainting
A1111 Inpainting
Diffusers Script:
A111 Settings
red hair Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 1024x1024, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 4, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0.3, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a
--
Bonus Example (Top: Diffusers, Bottom: A1111)