lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.07k stars 143 forks source link

Input and Context size in CrossAttention #68

Closed caffeinetoomuch closed 3 years ago

caffeinetoomuch commented 3 years ago

Does the input tensor size and the context size have to be the same size for CrossAttention(causal)? If they don't, this line would definitely cause exception since einsum does not support broadcast by default. I have been tokenizing with padding to the same max length for both in order to run my model with performer.

I am confused whether this has to be fixed from repo side, or same size has to be used in order to use causal attention. Thanks in advance for this great repository!

lucidrains commented 3 years ago

Hey! Cross attention actually cannot be causal - could you show me a reproducible script for the error you are running into?

caffeinetoomuch commented 3 years ago

Sorry for the late reply, yeah you are right. My misunderstading on the causal attention :sweat_smile: I will close this issue.

Thanks!