tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
5.08k stars 331 forks source link

The implementation of decoupled cross-attention #311

Open xingyueye opened 6 months ago

xingyueye commented 6 months ago

@kovalexal In the paper, the de-coupled cross-attention allows text and image to go through different Linear layers respectively, and then perform cross-attention and add the results. However, in the code implementation that it is directly concatenated and passed to the Unet. as follow: https://github.com/tencent-ailab/IP-Adapter/blob/5a18b1f3660acaf8bee8250692d6fb3548a19b14/tutorial_train.py#L118 Could you pls explain this implementation details.

xiaohu2015 commented 6 months ago

https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter/attention_processor.py#L137

xingyueye commented 6 months ago

got it, thanks