tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.51k stars 297 forks source link

What is the difference between regular and `plus`? #108

Open landmann opened 8 months ago

landmann commented 8 months ago

You guys should add it on the README as it is quite vague right now why you'd use one vs. the other.

MadaraxUchiha88 commented 8 months ago

I believe I can answer this since I've been using IP-Adapter for quite some time now. IP-Adapter SD15 makes your image a blend of both your ref image and whatever prompt you're doing (or LoRA). IP-Adapter SD15 Plus makes it closer to the reference image, and IP-Adapter SD15 Light makes it closer to your prompt or LoRA. Hope that helps :)

xiaohu2015 commented 8 months ago

I believe I can answer this since I've been using IP-Adapter for quite some time now. IP-Adapter SD15 makes your image a blend of both your ref image and whatever prompt you're doing (or LoRA). IP-Adapter SD15 Plus makes it closer to the reference image, and IP-Adapter SD15 Light makes it closer to your prompt or LoRA. Hope that helps :)

@landmann That's it

HelenWu99 commented 8 months ago

I believe I can answer this since I've been using IP-Adapter for quite some time now. IP-Adapter SD15 makes your image a blend of both your ref image and whatever prompt you're doing (or LoRA). IP-Adapter SD15 Plus makes it closer to the reference image, and IP-Adapter SD15 Light makes it closer to your prompt or LoRA. Hope that helps :)

Hello, I would like to know where is the file related to "IP-Adapter SD15 Light"? I can't find any information related to "IP-Adapter SD15 Light" in the now released code.

xiaohu2015 commented 8 months ago

I believe I can answer this since I've been using IP-Adapter for quite some time now. IP-Adapter SD15 makes your image a blend of both your ref image and whatever prompt you're doing (or LoRA). IP-Adapter SD15 Plus makes it closer to the reference image, and IP-Adapter SD15 Light makes it closer to your prompt or LoRA. Hope that helps :)

Hello, I would like to know where is the file related to "IP-Adapter SD15 Light"? I can't find any information related to "IP-Adapter SD15 Light" in the now released code.

https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter_sd15_light.bin

hchasens commented 4 months ago

Just for accuracy I thought I'd post a quote from the model itself.

The Plus model is not intended to be seen as a "better" IP Adapter model - Instead, it focuses on passing in more fine-grained details (like positioning) versus "general concepts" in the image.

From the sound of it, the normal ip-adapter focuses on concepts. Almost like a LORA (though not nearly as good I'd imagine). The ip-adapter-plus focuses on not just concepts but positioning, almost like a mix between a LORA and a ControlNet. That's what I got from it anyways.

For example, if you had a cyberpunk city street and wanted to add that specific cyberpunk theme to a character you'd use ip-adapter but where as ip-adapter-plus might actually make it harder since there wouldn't be any character in the original source.

Jannchie commented 2 months ago

I wonder how the plus and regular versions differ in terms of algorithm and model structure?

chen-xin-94 commented 1 month ago

I wonder how the plus and regular versions differ in terms of algorithm and model structure?

From my understanding of the code of IPAdapter and IPAdapterPlus , the regular version uses a simple linear projection of the CLIP image embeddings to generate the image features, as described in the paper. In contrast, the plus version applies a Perceiver Resampler (similar to the one used in Flamingo) to perform the projection on the second-to-last hidden layer features of the CLIP model instead.