What is the role of num_tokens?

tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Apache License 2.0

4.68k stars 307 forks source link

Thank you very much for your great work ! I encountered a problem while reading the source code: what is the role of num_tokens?

I found the num_tokens parameter in the source code of IPAttnProcessor in the attention_processor.py.

The only scenario where num_tokens is used is in forward to split an ip_hidden_states from the original hidden states for the calculation of the new attention mechanism, which corresponds to formulas (4) and (5) in the paper.

But if I understand it correctly, in formula (5) of the paper, the new IP attention mechanism should use image features for calculation, but in the code, it seems that num_tokens is cut out from the mixed hidden states part to implement a new attention mechanism. How to ensure that the hidden states in the last num_tokens part only correspond to image features?

Thank you very much.

end_pos = encoder_hidden_states.shape[1] - self.num_tokens encoder_hidden_states, ip_hidden_states = ( encoder_hidden_states[:, :end_pos, :], encoder_hidden_states[:, end_pos:, :], )

tencent-ailab / IP-Adapter

What is the role of num_tokens? #147