Open cubiq opened 1 year ago
@cubiq hi, can you give a description of how to implement multiple images?
you can send a batch of tensors in the form of like (4, 224, 224, 3)
for 4 images (you can just stack them).
this is an example with 4 images:
@cubiq this is achieved by concat the image features of 4 images ( 16 x 4 = 64 tokens)? I see another blog https://civitai.com/articles/2345, I am not sure that if your implementation is same as that.
yes exactly.
the blog you linked uses more or less the same code as mine.
Could someone please provide an example of how to use multiple images outside of Comfy?
Would something like this work perhaps? Is my understanding correct?
image_paths = ["img1", "img2", "img3", "img4"]
image_tensors = []
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor()
])
for path in image_paths:
image = Image.open(path)
image_tensor = transform(image)
image_tensors.append(image_tensor)
image_batch = torch.stack(image_tensors)
print(image_batch.shape) # Should print torch.Size([4, 3, 224, 224])
images = ip_model.generate(pil_images=image_batch, num_samples=num_samples, num_inference_steps=30, seed=42)
yes, stacking should work
Hmm since the get_image_embeds method is designed work with PIL images, would something like this suffice?
pil_images = [Image.open(path).resize((224, 224)) for path in image_paths]
btw looks like Comfy is using this node type for image batching: https://github.com/comfyanonymous/ComfyUI/blob/213976f8c3ea3f45f0c692dd8aac2fd9fea433e3/nodes.py#L1490
class ImageBatch:
@classmethod
def INPUT_TYPES(s):
return {"required": { "image1": ("IMAGE",), "image2": ("IMAGE",)}}
RETURN_TYPES = ("IMAGE",)
FUNCTION = "batch"
CATEGORY = "image"
def batch(self, image1, image2):
if image1.shape[1:] != image2.shape[1:]:
image2 = comfy.utils.common_upscale(image2.movedim(-1,1), image1.shape[2], image1.shape[1], "bilinear", "center").movedim(1,-1)
s = torch.cat((image1, image2), dim=0)
return (s,)
So far my results are terrible 🥲
that looks like you are sending the cond images to the uncond as well
when i try this method, output images are generated based on distinct images of the batch. However, as i understand output images should be a synthesis based on all the images in the batch. which is it ? If it is the second one how should i pass the images to the ip-adapter generate function ?
please note that I work mainly with comfyui so I wasn't aware of the diffusers situation.
I had a quick look at the code and it seems that only the first image is prompted no matter how many images are sent. The image encoder works as expected and correctly encodes all the images so there must be some clipping happening somewhere down the line.
I'll work on a diffuser project soon so I might be looking into this
Thanks @cubiq! If you have a github sponsor or kofi link I would be happy to help support development of this feature!
I was able to implement image weighting in ComfyUI @xiaohu2015
In the image below you can see two different results using the same 2 images with different weights
as always the code here https://github.com/cubiq/ComfyUI_IPAdapter_plus
Regarding Diffusers the thing is a bit more complicated, the current implementation offered by tencent-ailab is a bit too "rigid" and it would require some refactoring... a more dynamic approach would be for the library to only exports the embeds and then the user can integrate those into any pipeline (given the right combination of encoder/ipadapter model/main checkpoint). Alternatively the official API should be followed a bit more closely. I'll look into it in the coming days
Okay good news, I was able to replicate all the comfyui features in diffusers.
On the left is the diffusers image that as you can see is very close to the image on the right generated with 2 images in ComfyUI (the difference in sharpness is caused by different sampling algorithms used for the image encoder).
It's just a matter merging the image embeds and increasing the number or tokens in the attention processor. I'll do some code clean up and post the code somewhere!
That is huge 💪 Can't wait to test it out!
More info about the new code here #99
You can find the code and examples in my repo https://github.com/cubiq/Diffusers_IPAdapter
It's still a bit experimental, but should be enough to get you started. Have fun!
when sending a batch of images for the conditioning, how would you go at giving a different weight at each of them?