haofanwang / ControlNet-for-Diffusers

Transfer the ControlNet with any basemodel in diffusers🔥
MIT License
813 stars 48 forks source link

Require more information regarding inpainting #5

Open ghpkishore opened 1 year ago

ghpkishore commented 1 year ago

Hi!

@haofanwang

I am trying to understand how to perform inpaintiing with control net which you had mentioned in the third part of the Read me. I have gotten extremely poor results in comparison to normal inpainting with the same model, and therefore feel something is not quite right in the code. I tried with the canny edge detection annotator

The way my process is structured is that I get an input image and a mask, which is what is required for typical inpainting pipeline. Then using the annotators, I create a canny image of a reference image, which I pass as the control hint.

However, the output in no way ensures the fidelity of the original input and also doesn't follow the canny edge properly.

Code is below:

import torch
from diffusers.utils import load_image
from diffusers import StableDiffusionInpaintPipeline, StableDiffusionControlNetInpaintPipeline
from annotator.util import resize_image, HWC3
from annotator.canny import CannyDetector
import PIL
from PIL import Image
import cv2
import einops
import gradio as gr
import numpy as np
import torch
import random

def getCannyImage(input_image,image_resolution, low_threshold, high_threshold):
    input_image=np.array(input_image)
    apply_canny = CannyDetector()
    with torch.no_grad():
        img = resize_image(HWC3(input_image), image_resolution)
        H, W, C = img.shape
        detected_map = apply_canny(img, low_threshold, high_threshold)
        detected_map = HWC3(detected_map)
    cannyImageNumpy =[255-detected_map]
    cannyImage=Image.fromarray(cannyImageNumpy[0])
    return cannyImage 

pipe_control = StableDiffusionControlNetInpaintPipeline.from_pretrained("models/control_sd15_canny",torch_dtype=torch.float16).to('cuda')
pipe_inpaint = StableDiffusionInpaintPipeline.from_pretrained("models/stable-diffusion-inpainting",torch_dtype=torch.float16).to('cuda')
pipe_control.unet = pipe_inpaint.unet
pipe_control.unet.in_channels = 4

image = load_image("./inputImages/input.png")
mask = load_image("./inputImages/mask.png")
canny_input=load_image("./inputImages/reference.png")

# Canny edge detection parameters
image_resolution, low_threshold, high_threshold=512,100,200
cannyImage=getCannyImage(canny_input,image_resolution, low_threshold, high_threshold)
control_image = cannyImage
control_image = control_image

image = pipe_control(prompt=" Woman performing Yoga on a tennis court", 
                     negative_prompt="lowres, bad anatomy, worst quality, low quality",
                     controlnet_hint=control_image, 
                     image=image,
                     mask_image=mask,
                     num_inference_steps=100).images[0]

image.save("inpaint_canny.jpg")

Reference and Input Images are also below, mask is essentially mask of the input image: reference input

Please let me know if I am missing something.

haofanwang commented 1 year ago

It just looks fine to me. @ghpkishore

(1) To make sure everything goes well, could you first try our example with segmentation and check you can get same result as ours.

(2) Could you post your control_image and mask here? The mask should be binary.

ghpkishore commented 1 year ago

HI @haofanwang I did check whether the mask was binary, and it initially wasn't. Then i fixed it to ensure that it is, even after that it is not functioning correctly. I am adding all the five images here. Input, mask, reference image for canny, canny output, and final output

(1) To make sure everything goes well, could you first try our example with segmentation and check you can get same result as ours. - Yes i tried with your example, it worked correctly.

I do not know if it is because I am painting over a much larger area and maybe that is why it fails. But in normal inpainting it works.

Tennis_reference Tennis_input

Tennis_canny Tennis_output

Tennis_mask

haofanwang commented 1 year ago

I will try your images to check what's going wrong.

haofanwang commented 1 year ago

I can reproduce it. But it seems to be related to the basemodel. Even if I directly use the official demo, you can see the face is still distorted. So, don't worry, you should already be on the right way. By the way, which normal inpainting model do you use? Is it also a stable-diffusion model? If so, you can use it via our script. @ghpkishore

截屏2023-02-24 19 12 28
ghpkishore commented 1 year ago

Thanks @haofanwang . Is there anyway I can fix it then? I should be able to use SD 2.1 inpainting model if need be right? Also, regarding the binarization of mask, I do not feel it is necessary, as your code already has that in place inside of the prepare_mask_and_masked_image function.

The model I was using is runwayml inpainting model, the one you showcases above

haofanwang commented 1 year ago

I'm not sure whether the naming is consistent between SD1.5 and SD 2.1, if so, yes. You can have a try, just report here if it fails, I can support it once I have time, or it would be much appreciated if you can help with it. @ghpkishore

ghpkishore commented 1 year ago

@haofanwang It doesn't work directly with the SD2.1 inpainting model. Need to figure out why. The error I got was "mat1 and mat2 shapes cannot be multiplied (154x768 and 1024x320) "

I will try with different sets of control nets for inpainting with 1.5 and then move on to 2.1.

Also is there a possible to tack on multiple different condition nets together? I know that it is under open discussion for controlnet library, however, since it is possible in t2i adapter,I would want to figure out if inpainting is possible with it. Seems like it should be similar file to control net and inpainting integration.

haofanwang commented 1 year ago

For SD2.1, it may not be compatible with existing ControlNet models that trained on SD1.5.

For multi-control, I haven't support it yet. I know that the team from diffusers are working on it, so I don't want to make this project too heavy. But I will take a look on it.

ghpkishore commented 1 year ago

Oh wow, thanks for letting me know. I just started to look into trying to figure out where the difference is coming from. I thought if I can identify which part of the code I am getting the matmul error, I would be able to fix it. I didn't know that it could be due to difference in training models

ghpkishore commented 1 year ago

@haofanwang I do not think it is the model which is at fault. I really think there is something wrong with the canny edge implementation of mine

I tried with the segmentation model, it seemed to work. Therefore something seems off with canny edge and my code. Will work again

haofanwang commented 1 year ago

To verify, you can just use the web demo to generate a canny edge image, check whether it is same as yours, and whether this canny image can solve your problem. Don't dive into code directly that is really maddening. @ghpkishore

Also, as mentioned in https://github.com/haofanwang/ControlNet-for-Diffusers/issues/10, the converting may lead to unexpected result due to some unknown reasons.