leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++
MIT License
3.23k stars 272 forks source link

Support Inpainting #105

Open 10undertiber opened 9 months ago

10undertiber commented 9 months ago

It would be great to add input parameters to the current SD cli to specify an input and mask file to run the inpainting. For example:

./bin/sd -m ../models/sd-v1-4.ckpt -p "a lovely dog" --image ../input/alovelybench.png --mask ../input/alovelybench.mask.png

The input image: alovelybench

The input mask: alovelybench mask

The output: alovelybench output

Here some references:

  1. https://stable-diffusion-art.com/inpainting_basics
  2. https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint
FSSRepo commented 9 months ago

@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged. However, the image-to-image mode is also causing quality issues, the images appear overly smoothed, blurred and distorted. Even with a strength setting of 0.05, the final image bears little resemblance to the original.

leejet commented 9 months ago

@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged. However, the image-to-image mode is also causing quality issues, the images appear overly smoothed, blurred and distorted. Even with a strength setting of 0.05, the final image bears little resemblance to the original.

It seems that img2img has some issues and the results are inconsistent with sd-webui.

FSSRepo commented 9 months ago

@leejet I think we should first solve that problem before considering adding the inpainting feature.

Inpainting models require a latent image with 9 input channels, 4 for the usual channels, 4 more for the latent noise with the applied mask, and 1 for the mask. There may also be a need for a slight modification in the autoencoder, but I will continue researching.

leejet commented 9 months ago

@leejet I think we should first solve that problem before considering adding the inpainting feature.

Inpainting models require a latent image with 9 input channels, 4 for the usual channels, 4 more for the latent noise with the applied mask, and 1 for the mask. There may also be a need for a slight modification in the autoencoder, but I will continue researching.

Yes, the first step is to fit the inpaint model. We can determine whether the weight currently loaded is the weight of the inpaint model according to the shape of the weight.

Amin456789 commented 9 months ago

hope someone makes a cute gui for it as inpainting will be much easier to just mask it through the gui, not to mention cpp with gui will be great

Amin456789 commented 8 months ago

@FSSRepo i see u r working on a webui [cant wait for it], if possible please add outpainting as well, it will be great to have it, also i have a question, i have dark reader on my browser, will the dark reader make ur webui's background dark too? as working with white background in the night is very hard imo

programmbauer commented 4 months ago

Would it be possible to implement a simple form of inpainting, where the user specifies a rectangular region using command line parameters? And only this region of the input image would be changed. For example, the user could pass four integers (x,y,height,width) which define the top left corner and the dimensions of the rectangular region.

balisujohn commented 1 month ago

@leejet I'm pretty interested in fixing img2img and adding inpainting; do you have any pointers as to why it's currently not matching stable diffusion webui?