How to get multiple images for multiple prompts

satwiksunnam19 commented 1 year ago

Hello @mikonvergence, your work is awesome and i have a query regarding an issue which is rendering in my brain from days.

I have 10-15 different prompts and i want to infer on a single image, also with T4 GPU, the GPU goes into fragments for single image and single prompt.

Thanks and Regards, Satwik Sunnam.

mikonvergence commented 1 year ago

Hey! Thanks.

As long as you don't exceed your memory limit on the GPU, you can do that by supplying a list of prompts from the interface perspective.

This is an example I run with the canny edge in the supplied colab notebook.

pipe.to('cuda')

text_prompt=["squirrel","elephant"]

# generate image
generator = torch.manual_seed(0)
new_image = pipe(
    text_prompt,
    num_inference_steps=20,
    generator=generator,
    image=image,
    control_image=canny_image,
    controlnet_conditioning_scale = 0.5,
    mask_image=mask_image
).images

plt.subplot(1,2,1)
plt.imshow(new_image[0])
plt.subplot(1,2,2)
plt.imshow(new_image[1])

with the following result: Unknown-4

I don't know if this was your question or if you were asking about some more efficient way to compute this. That said, it is unlikely I will be making further changes in this respect, as I want this pipeline to resemble the diffusers related pipelines (inpainting and controlnet) as much as possible. Open to discussion though!

satwiksunnam19 commented 1 year ago

Does this work with a standard GPU, T4

mikonvergence commented 1 year ago

Yes

mikonvergence commented 1 year ago

Also, in case you're looking for an example with multiple images and multiple prompts, you can actually do that by supplying:

prompts as a list of prompts
image as a tensor [B,C,H,W] (where B can contain different input images) - if you convert from PIL, you need to scale to [-1,+1] range
mask as a tensor [B,1,H,W]
guide image as a tensor [B,C,H,W]

Example based on the canny edge example from the colab:

Conversion to tensors (if necessary)


from torchvision.transforms import ToTensor
img1=ToTensor()(image).unsqueeze_(0)
mask1=ToTensor()(mask_image).unsqueeze_(0)
canny1=ToTensor()(canny_image).unsqueeze_(0)

img2=torch.flip(img1,[-1]) mask2=torch.flip(mask1,[-1]) canny2=torch.flip(canny1,[-1])

img_stack=2*torch.cat([img1,img2],0)-1 # convert to [-1,+1] range mask_stack=torch.cat([mask1,mask2],0)[:,0,:,:] canny_stack=torch.cat([canny1,canny2],0)

>💡 In this example, the second image is the first image **horizontally flipped**

2. Pipeline
```python

text_prompt=["squirrel", "elephant"]

# generate image
generator = torch.manual_seed(0)
new_image = pipe(
    text_prompt,
    num_inference_steps=20,
    generator=generator,
    image=img_stack,
    control_image=canny_stack,
    controlnet_conditioning_scale = 0.5,
    mask_image=mask_stack
).images

Result:

for idx,img in enumerate(new_image):
plt.subplot(1,len(new_image),1+idx)
plt.imshow(img)

Unknown-5

mikonvergence / ControlNetInpaint

How to get multiple images for multiple prompts #9

Example based on the canny edge example from the colab: