Open jackylu0124 opened 5 months ago
Cc: @asomoza
Hi, this is something that is referred in the model card:
When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version.
There's some more experiments in this issue, at the end the only method that works is to inpaint and the paste back over the original image the generated part, also you'll need to match the histogram.
IMO it's not a good idea to use a strength of 1.0 which is literally as you're saying, ignore the original image. What you can do, is to use a generative fill in the area where you want to inpaint, also you can look at using differential diffusion or use an inpainting controlnet.
Hi @asomoza, thank you very much for your reply and insights!
I have the following follow-up questions:
strength=0.99999999
, the context of the original image still has too much influence on the generated result. For example, if the original image contains a blue car in it, and in my prompt I describe a pink car, and I run the inference with the mask over the car, the generated image would still have a blue car instead of a pink car. The only way for me to achieve this is by setting strength=1.0
, and the generated result looks as expected when I set strength=1.0
in StableDiffusionInpaintPipeline
, in fact I think strength=1.0
is the default value in the StableDiffusionInpaintPipeline
's __call__()
function. Do you by chance know the reason why using strength=1.0
in StableDiffusionInpaintPipeline
works fine and can change the color of the objects in the image based on the prompt and does not introduce any weird noises in the generated image while using strength=1.0
in the StableDiffusionXLInpaintPipeline
can also change the color of of the objects in the image based on the prompt but introduces a lot of weird noises? In other words, I would like to know how I can use strength=1.0
in the StableDiffusionXLInpaintPipeline
just like I did with the StableDiffusionInpaintPipeline
, but without all the noises in the generated result.Thank you very much for your time again!
1.- The reason is the difference in the model architecture and training, as far as I know, the only trained inpainting model for SDXL is the one from the diffusers team, no one else trained one and it has this one problem when using a strength of 1.0, so for the time being, this is a common problem without solution until someone else trains another one. Fooocus has one but it's a black box, I don't know if it's a trained one or a merge, the author didn't provide any information with it and only did it for fooocus with a lot of hard coded stuff so it's complicated to port it to other solutions.
2.- Generative fill refers that you fill the area with something that resembles what you want, there's a couple of methods for this for example: lama, patchmatch or the opencv ones. This is the best method for inpainting when you want to remove or change an object, also it works if you paint and guide the generation by yourself. Automatic1111 has some options too for this, for example to fill it with noise or paint it gray.
I'm still in debt about doing some inpainting guides, but maybe you can learn something from the outpainting ones I did, I show and apply some of this techniques.
If you provide me with some images and what you want to do (where you need a strength of 1.0), I can give you a quick guide on how to achieve it with some other techniques instead.
Hi @asomoza, thank you for your fast reply and insights!
I am mostly looking for programmatic solution as opposed to UI tools. Thanks a lot for sharing the link to your outpainting guides! I will take a look at those first.
Regarding the SDXL inpainting model and its training, do you know if the training script used for the stabilityai/stable-diffusion-2-inpainting
model is open sourced? Also is the training script for the diffusers/stable-diffusion-xl-1.0-inpainting-0.1
model that's trained by the diffusers team based on the training script that's used for the stabilityai/stable-diffusion-2-inpainting
model? And is the training script for the diffusers/stable-diffusion-xl-1.0-inpainting-0.1
model open sourced?
Thanks a lot again!
I mentioned automatic1111 just for you to know that sometimes filling it with noise or a gray color could also work since people use it in that UI.
The stable-diffusion-2-inpainting
was provided from Stabilty AI from the beginning, I didn't use SD2 that much and I don't really know if they released the training code, probably better to look or ask in their repo questions about it.
And about the training code for stable-diffusion-xl-1.0-inpainting-0.1
, no, there's is no open sourced code for training a SDXL inpainting model, not in diffusers and as far as I know, anywhere else.
I see, thank you very much for your detailed reply! So to confirm, the stable-diffusion-xl-1.0-inpainting-0.1
model is trained by the diffusers team, but it's not open sourced right?
The model weights have the same license as the original, this one in particular has a Open RAIL++-M License and if you're asking if you can use it commercially, yes.
If you provide me with some images and what you want to do (where you need a strength of 1.0), I can give you a quick guide on how to achieve it with some other techniques instead.
So here's an example to better illustrate the issue I mentioned above, and also the goal I want to achieve.
The following is the original image:
The following is the original image's mask:
I used the same script and seed in the code in the previous message with the prompt "White dress shirt, high quality, 4k"
and the following settings:
Result generated with StableDiffusionXLInpaintPipeline
with the diffusers/stable-diffusion-xl-1.0-inpainting-0.1
model with strength=0.99999999
:
Result generated with StableDiffusionXLInpaintPipeline
with the diffusers/stable-diffusion-xl-1.0-inpainting-0.1
model with strength=1.0
:
Result generated with StableDiffusionInpaintPipeline
with the stabilityai/stable-diffusion-2-inpainting
model with strength=1.0
:
As you can see, unless I set strength=1.0
in the StableDiffusionXLInpaintPipeline
with the diffusers/stable-diffusion-xl-1.0-inpainting-0.1
model, the original image's context (the black color of the black jacket) really has a much heavier influence than the prompt (the white color in the prompt "White dress shirt, high quality, 4k"). What I would like to achieve is for the inpainting to be more directed by the prompt instead of the original image's context, and so far I could only achieve that by setting strength=1.0
. But as you can also see, setting strength=1.0
in the StableDiffusionXLInpaintPipeline
with the diffusers/stable-diffusion-xl-1.0-inpainting-0.1
model also introduces a lot of noises in the generated result.
I would really appreciate any insights you have on how I could achieve this goal. Thanks a lot again!
ok, first lets start that you're using a 512px image with SDXL which is really bad, the main reason you're getting those bad weird borders around the image is because of that, for example with a 1024 image:
But then we still see the discoloration and the noise over the white background, it's not that evident if you don't use a white background though.
Since you're using a extreme case, where you want to inpaint something white over something black, you'll need remove the black first, I suggest using lama for the bests results but since you're using a strength of 1.0 in your example, that means you don't care about the previous content of the image so you can literally just erase what was before, maybe paint it with gray or white.
If I have time later I'll give it a try to show you an example.
Hi @asomoza , thank you very much for your detailed reply and experiment!
The reason I am using 512x512 input image and mask is for the purpose of comparing the generated result from the StableDiffusionInpaintPipeline
(with the stabilityai/stable-diffusion-2-inpainting model) and the StableDiffusionXLInpaintPipeline
(with the diffusers/stable-diffusion-xl-1.0-inpainting-0.1 model) because the StableDiffusionInpaintPipeline
(with the stabilityai/stable-diffusion-2-inpainting model) takes in 512x512 input. Also note that in the script I pasted in my previous message, I resize the image to 1024x1024 before passing into the StableDiffusionXLInpaintPipeline
(with the diffusers/stable-diffusion-xl-1.0-inpainting-0.1 model).
But regardless, my main concern is not the "bad weird borders" around the image, but rather the "discoloration and the noise" you mentioned. For example, the color of the face of the person in the generated image in your experiment looks a lot more saturated than the one in the original image and there are also noises covering the generated image. And to confirm, in order to resolve the issue (the issue where the original image's context has much greater influence than the prompt does), I can set strength
to be less than 1.0
and try replacing the area to be inpainted with gray or white color right?
Would replacing the area to be inpainted with pure black or randomized pixels color solve the issue (the issue where the original image's context has much greater influence than the prompt does) as well?
Thanks for your help again!
I see, thank you very much for your detailed reply! So to confirm, the
stable-diffusion-xl-1.0-inpainting-0.1
model is trained by the diffusers team, but it's not open sourced right?
@asomoza Apologies for the confusion earlier, what I meant to ask is that whether the training code/script used by the diffusers team to train the stable-diffusion-xl-1.0-inpainting-0.1
model is open sourced, and if so where can I find it?
Thanks a lot again!
the color of the face of the person in the generated image in your experiment looks a lot more saturated than the one in the original image and there are also noises covering the generated image
the difference in the saturation and the noise in the background can be fixed with pasting just the inpainted area in the original image and then match the histogram, I did that in the post I linked before. That's one solution to this problem.
in order to resolve the issue (the issue where the original image's context has much greater influence than the prompt does), I can set strength to be less than 1.0 and try replacing the area to be inpainted with gray or white color right?
yes also depending on the use case, you can also use a generative fill for this.
Would replacing the area to be inpainted with pure black or randomized pixels color solve the issue (the issue where the original image's context has much greater influence than the prompt does) as well?
In this case no, pure black would have the same problem as the original image, it's hard for the inpainting model to try to change something black with something white unless you set the strength to 1.0 which is telling the model to completely ignore what was in that area before. Random pixels could work but not that well, and since it's random if you get too much dark pixels will have the same problem.
@asomoza Apologies for the confusion earlier, what I meant to ask is that whether the training code/script used by the diffusers team to train the stable-diffusion-xl-1.0-inpainting-0.1 model is open sourced, and if so where can I find it?
This code is not public and hasn't been shared, this library only shares training code as basic examples and encourage the users to use this code, learn from it and adapt it for the specific tasks. I think there isn't any training code available for any inpainting model the same as for example multiple aspect ratio or more advanced training codes.
Hi @asomoza , thank you for your detailed reply and explanation! Also sorry about my late reply, would you mind sharing the link to the post you mentioned in
the difference in the saturation and the noise in the background can be fixed with pasting just the inpainted area in the original image and then match the histogram, I did that in the post I linked before. That's one solution to this problem.
where you demonstrate "pasting" and "histogram matching"?
Thanks a lot for the help again!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
I am currently comparing the inpainting generated results between the diffusers/stable-diffusion-xl-1.0-inpainting-0.1 model and the stabilityai/stable-diffusion-2-inpainting model, and I noticed that the
strength
parameter in the__call__()
function inStableDiffusionInpaintPipeline
defaults to 1.0 whereas thestrength
parameter in the__call__()
function inStableDiffusionXLInpaintPipeline
default to 0.9999.What I want to achieve is that I want to use
strength=1.0
in theStableDiffusionXLInpaintPipeline
pipeline because otherwise the original content of the image has a much larger impact than the prompt does on the generated result. For example, if the original image has a blue car, and my prompt describes a pink car, usingstrength=0.9999
or evenstrength=0.99999999
would still show a blue car in the generated result. And the only way that I can effectively avoid this behavior is by settingstrength=1.0
when using theStableDiffusionXLInpaintPipeline
pipeline. However, usingstrength=1.0
in theStableDiffusionXLInpaintPipeline
pipeline introduces a lot of noises in the generated image, and I have tried increasing the number of inference steps but it does not help with removing the noises.For example, the following are the original image (with white pixels added in the margin to better illustrate the weird noises) and the original image's corresponding mask as well as the inpainted results of the two pipelines. And the result from the
StableDiffusionXLInpaintPipeline
pipeline has a lot of noises.P.S. I also read something that sounds similar in https://github.com/huggingface/diffusers/issues/4392, but am not sure if the noises that I am seeing here is the same thing as what's discussed in that issue. Plus I would like to know how I can resolve the weird noises issue when using
StableDiffusionXLInpaintPipeline
with the diffusers/stable-diffusion-xl-1.0-inpainting-0.1 model withstrength=1.0
.Original Image: Original Image's Corresponding Mask: Inpainted Result (
StableDiffusionInpaintPipeline
withstrength=1.0
and the prompt"Furry lion sitting on a bench, high quality, 4k"
) Inpainted Result (StableDiffusionXLInpaintPipeline
withstrength=1.0
and the prompt"Furry lion sitting on a bench, high quality, 4k"
)Reproduction
The following is the code I used to generate the inpainted result for both the
StableDiffusionInpaintPipeline
(with the stabilityai/stable-diffusion-2-inpainting model) and theStableDiffusionXLInpaintPipeline
(with the diffusers/stable-diffusion-xl-1.0-inpainting-0.1 model). You can change the boolean value on the lineUSE_SDXL_INPAINT = True # <=== Change this
to generate inpainted result of the respective pipeline/model. I have also pasted the original image and its corresponding mask image that I used in the "Describe the bug" section above.Code:
Logs
No response
System Info
System: Windows GPU: RTX 3090
diffusers-cli env
outputdiffusers
version: 0.27.2Who can help?
@yiyixuxu @sayakpaul