mikonvergence / ControlNetInpaint

Inpaint images with ControlNet
MIT License
344 stars 29 forks source link

How to get multiple images for multiple prompts #9

Closed satwiksunnam19 closed 1 year ago

satwiksunnam19 commented 1 year ago

Hello @mikonvergence, your work is awesome and i have a query regarding an issue which is rendering in my brain from days.

I have 10-15 different prompts and i want to infer on a single image, also with T4 GPU, the GPU goes into fragments for single image and single prompt.

Thanks and Regards, Satwik Sunnam.

mikonvergence commented 1 year ago

Hey! Thanks.

As long as you don't exceed your memory limit on the GPU, you can do that by supplying a list of prompts from the interface perspective.

This is an example I run with the canny edge in the supplied colab notebook.

pipe.to('cuda')

text_prompt=["squirrel","elephant"]

# generate image
generator = torch.manual_seed(0)
new_image = pipe(
    text_prompt,
    num_inference_steps=20,
    generator=generator,
    image=image,
    control_image=canny_image,
    controlnet_conditioning_scale = 0.5,
    mask_image=mask_image
).images

plt.subplot(1,2,1)
plt.imshow(new_image[0])
plt.subplot(1,2,2)
plt.imshow(new_image[1])

with the following result: Unknown-4

I don't know if this was your question or if you were asking about some more efficient way to compute this. That said, it is unlikely I will be making further changes in this respect, as I want this pipeline to resemble the diffusers related pipelines (inpainting and controlnet) as much as possible. Open to discussion though!

satwiksunnam19 commented 1 year ago

Does this work with a standard GPU, T4

mikonvergence commented 1 year ago

Yes

mikonvergence commented 1 year ago

Also, in case you're looking for an example with multiple images and multiple prompts, you can actually do that by supplying:

Example based on the canny edge example from the colab:

  1. Conversion to tensors (if necessary)
    
    from torchvision.transforms import ToTensor
    img1=ToTensor()(image).unsqueeze_(0)
    mask1=ToTensor()(mask_image).unsqueeze_(0)
    canny1=ToTensor()(canny_image).unsqueeze_(0)

img2=torch.flip(img1,[-1]) mask2=torch.flip(mask1,[-1]) canny2=torch.flip(canny1,[-1])

img_stack=2*torch.cat([img1,img2],0)-1 # convert to [-1,+1] range mask_stack=torch.cat([mask1,mask2],0)[:,0,:,:] canny_stack=torch.cat([canny1,canny2],0)

>💡 In this example, the second image is the first image **horizontally flipped**

2. Pipeline
```python

text_prompt=["squirrel", "elephant"]

# generate image
generator = torch.manual_seed(0)
new_image = pipe(
    text_prompt,
    num_inference_steps=20,
    generator=generator,
    image=img_stack,
    control_image=canny_stack,
    controlnet_conditioning_scale = 0.5,
    mask_image=mask_stack
).images
  1. Result:
    for idx,img in enumerate(new_image):
    plt.subplot(1,len(new_image),1+idx)
    plt.imshow(img)

Unknown-5