Open cortwave opened 3 months ago
Hi, do you have a use case where this is important? I get that this can be considered a bug but as I see it, this can be fixed in four ways:
init_image
like it's done with img2img.IMO doing an inpaint with a different aspect ratio generates a really bad image and I don't see why anyone would use it. This is probably the reason why no one noticed this until now.
@asomoza Hi, I don't understand why inpainting region should have the same size as initial image. For example, if I have 2k X 4k image and want to inpaint only 1024x1024 square there. Why should initial image size decrease the quality of this inpainted square? Also this problem is not about aspect ratio, for example, if I use height
and width
of the init_image
everything works fine, but if I use height // 2
and width // 2
(it means the same aspect ratio) I still have problem. So that means that the original image should has exactly the same resolution as inpainting area.
The use case when it can be an issue. If I have a big image (8k resolution) and want to inpaint some small area. According to your suggestion I need to downscale my image to 1024x1024 and perform inpainting. But that means that if I want to get the image in initial size I have to upscale it back. So I just loose image quality downscaling->upscaling x8.
P.S. I agree that inpainting when mask crop aspect ratio and generated crop image aspect ratio are different can lead to bad results due to anisotropic resize during overlay operation, but it doesn't mean that original image and generated crop image should have the same size.
I understand better now, I think you have a small misunderstanding on what the code does.
First, to clarify in the aspect ratio, I meant your example, if you don't pass padding_mask_crop=16
you'll see what I mean.
So for the rest, the cropping was something that was added later to match the functionality of sd-webui, which is just that to get a better quality inpainting, it makes a square on the inpainting area (masked), upscale it, do the inpaint, then downscale it again and paste it over the original image.
For this to work and to match the positions of the original image and the "paste back" generated image, you need to use a mask that is the same size of the original and generated image. If the mask has a different size than the generated image, the paste back won't match and you'll get your result.
To get it to work in your case, you need to do this steps outside the pipeline and not inside, so for example if you have a 8k image and you want to inpaint a small square of it, you'll need to manually crop that square to a for example, 1024 square and the use that as a source image, diffusers doesn't do that part for you and as far as I know, neither do the webuis except for InvokeAI which does this that I'm describing in the Unified Canvas.
Edit: Forgot about this:
So that means that the original image should has exactly the same resolution as inpainting area
What is required for this to work is that the generation matches the original image and the mask, not the inpaint area, so to fix your example, you just need to set the width and height to match the other two, the inpainted area will get upscaled and pasted back correctly then.
For this to work and to match the positions of the original image and the "paste back" generated image, you need to use a mask that is the same size of the original and generated image. If the mask has a different size than the generated image, the paste back won't match and you'll get your result.
I agree that mask should have the same size as the original image, but don't agree that it also must be equal to generated image size. If we cut a crop from the original image and from the mask correspondingly and generate image of the same size (not the size of the original mask but the size of crop) that means we can past that newly generated image using crop coordinates of the original image.
It's really possible to do, but the logic of apply_overlay is strange. We need to past newly generated image to the empty image of init_image
size and alpha-blend it using non-cropped mask with init_image
. But instead of it we resize init_image
to the size of generated crop.
Edit:
What is required for this to work is that the generation matches the original image and the mask, not the inpaint area, so to fix your example, you just need to set the width and height to match the other two, the inpainted area will get upscaled and pasted back correctly then.
Does it mean if I want to inpaint 1024x1024 square on the 4096x4096 image, I have to pass heigh=4096, width=4096
to match the size?
I fail to understand your logic here:
If we cut a crop from the original image and from the mask correspondingly and generate image of the same size (not the size of the original mask but the size of crop) that means we can past that newly generated image using crop coordinates of the original image.
I'll do what I think you're suggesting:
This is what you're masking:
so the crop is this:
And here is where I get lost, you're generating a 1024x1024 image, so where do you want to paste the generated part? Nor the original image or the mask are square images and the final image is like this:
So the original crop coordinates don't match with the generated image, so I don't understand how do you expect the code to know where to paste back the image and why are you assuming that the inpainting will match the generated square image.
Does it mean if I want to inpaint 1024x1024 square on the 4096x4096 image, I have to pass heigh=4096, width=4096 to match the size?
Yes and no, if you have a model that supports that resolution and have the VRAM to feed that big image, yes, with SDXL is not possible with a consumer card and also because the model doesn't support that resolution.
Ok, we have following signature for the apply_overlay
method:
def apply_overlay(
self,
mask: PIL.Image.Image,
init_image: PIL.Image.Image,
image: PIL.Image.Image,
crop_coords: Optional[Tuple[int, int, int, int]] = None,
) -> PIL.Image.Image:
For my example I dumped all input parameters:
mask
init_image
image
crop_coordinates: (55, 102, 616, 663)
Here is default output:
How I assume it should really work (I omit a code with alpha-blending for simplicity, it doesn't affect anything such as mask is binary):
x0, y0, x1, y1 = (55, 102, 616, 663)
w = x1 - x0
h = y1 - y0
crop_generated = cv2.resize(generated, (h, w), cv2.INTER_LANCZOS4)
init_image[y0:y1, x0:x1] = crop_generated
Result of my code snippet:
As result we have inpainted image with original image size and aspect ratio, not distorted as in your example
what I don't understand in your example and logic is that the mask is not a square but you somehow are trying to generate a square inpainting, so in short, how is that this mask:
can generate this image:
FYI, what gets generated in the inpainting pipeline is the white part of the image, and when you use padding_mask_crop
, this is the part that gets pasted back in the original image.
Also, I already said this, if you use any other of the webuis, this is the default, even more, I never have seen something like you're suggesting so if you know any other lbrary/software where the inpainting works like you're suggesting, it will be easier if you provide a link to it.
When we inpaint with padding_mask_crop
the generated area is not white area but smth like this:
From the official doc:
padding_mask_crop (int, optional, defaults to None) — The size of margin in the crop to be applied to the image and masking. If None, no crop is applied to image and mask_image. If padding_mask_crop is not None, it will first find a rectangular region with the same aspect ration of the image and contains all masked area, and then expand that area based on padding_mask_crop. The image and mask_image will then be cropped based on the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large and contain information inreleant for inpainging, such as background.
Pay attention to this part
it will first find a rectangular region with the same aspect ration of the image and contains all masked area, and then expand that area based on padding_mask_crop
Regarding this:
Also, I already said this, if you use any other of the webuis, this is the default, even more, I never have seen something like you're suggesting so if you know any other lbrary/software where the inpainting works like you're suggesting, it will be easier if you provide a link to it.
Inpainting pipeline can be much easier to use, all the things we need to do - just to past a generated image to the right place.
Oh yeah, you're right, thanks for the correction. I think we should paste just the white masked part and not the whole square but that's another issue.
So if I finally understand correctly (sorry for that), what you want is this:
I'll let @yiyixuxu decide about this, because for me, more than a bug (which I think we just need to do a validation to fix) you're asking for a new feature that seems to be a breaking change.
Yes, I think it could be a great feature for the inpainting pipeline. However, I believe that image_processor.apply_overlay
lacks sufficient documentation, making it difficult to understand how it's supposed to work. While the inpainting pipeline itself seems fine, the logic for applying the overlay appears to be somewhat flawed.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
The
apply_overlay
method ofimage_processor
works unexpectedly ifinit_image
andimage
have different shapes. For example it can appear usingStableDiffusionXLInpaintPipeline
withpadding_mask_crop
value filled.According to
apply_overlay
code, it resizesinit_image
to theimage
size and works with all images in these coordinates. And ifcrop_coordinates
are ininit_image
coordinates (that is expected and intuitive,StableDiffusionXLInpaintPipeline
uses exactly this coordinates system) then it insertsimage
into the wrong place.Reproduction
This code corresponds to the example from official documentation.
Output Image
As we can see, the crop was pasted into the wrong location such as
crop_coordinates
used inapply_overlay
method are in original image system, but all was resized to the crop image sizeLogs
No response
System Info
Who can help?
No response