runwayml / stable-diffusion

Latent Text-to-Image Diffusion
Other
3.76k stars 449 forks source link

unmatched performance bewteen colab, runway website, and HuggingFace #7

Open weijiawu opened 1 year ago

weijiawu commented 1 year ago

Runway Inpainting in colab and HuggingFace works worse than on the site. During generation, the entire picture is distorted, even the area that was not selected. This leads to deformation of the face for example. 1- original, 2- colab, 3 - runway

weijiawu commented 1 year ago
image
weijiawu commented 1 year ago

photograph of a car

image
weijiawu commented 1 year ago
image
weijiawu commented 1 year ago

It seems the performance of the runway website is better than that of other platforms.

Dima-369 commented 1 year ago

I am pretty sure if you apply the code in https://github.com/runwayml/stable-diffusion/issues/5#issuecomment-1289915959, this is not an issue anymore.

From your example, it also looks like the steps are too low, maybe try 100 or 200.


And even runway's Erase and Replace tool on their homepage gives sub-optimal results as the masked area is still noticeably 'different' to the rest of the picture, even when the same content is rendered.

I find that img2img just gives better results overall but apparently changes the entire image which is sometimes hard to blend in back even when masked.

Question406 commented 1 year ago

I believe they're not using the same code, which is pretty annoying and confusing.

This is from the Huggingface pipeline(https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py), where the initialization of denoising steps is randomly sampled from Gaussian.

image

This is from the huggingface space(https://huggingface.co/spaces/runwayml/stable-diffusion-inpainting/blob/main/inpainting.py), and it takes latent of the raw image as the initialization of denoising steps.

image