The implementation of decoupled cross-attention

tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Apache License 2.0

5.32k stars 337 forks source link

The implementation of decoupled cross-attention #311

Open xingyueye opened 8 months ago

xingyueye commented 8 months ago

@kovalexal In the paper, the de-coupled cross-attention allows text and image to go through different Linear layers respectively, and then perform cross-attention and add the results. However, in the code implementation that it is directly concatenated and passed to the Unet. as follow: https://github.com/tencent-ailab/IP-Adapter/blob/5a18b1f3660acaf8bee8250692d6fb3548a19b14/tutorial_train.py#L118 Could you pls explain this implementation details.

xiaohu2015 commented 8 months ago

https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter/attention_processor.py#L137

xingyueye commented 8 months ago

got it, thanks