tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.46k stars 289 forks source link

Basic understanding of the IP adapter during image generation #377

Open StableQuestion opened 4 weeks ago

StableQuestion commented 4 weeks ago

Hey everyone, I'm trying to understand the IP adapter better. Maybe someone can help me:)

Paper:

https://arxiv.org/pdf/2308.06721.pdf

Would it be right to say:

1)An IP adapter model(e.g. ip-adapter_sdxl.bin) consists of a projection network(linear layer and normalization layer) and adapted modules(with decoupled cross attention)? 2) The modules marked in red in the image represent the function of the IP adapter model (e.g. ip-adapter_sdxl.bin) in the image generation process?

Maybe you can tell, I have no background in machine learning. I work with ComfyUI and read the paper out of interest. But linear algebra is not unknown if it gets mathematical :) fig

xiaohu2015 commented 3 weeks ago

yes