Closed caffeinetoomuch closed 3 years ago
Hey! Cross attention actually cannot be causal - could you show me a reproducible script for the error you are running into?
Sorry for the late reply, yeah you are right. My misunderstading on the causal attention :sweat_smile: I will close this issue.
Thanks!
Does the input tensor size and the context size have to be the same size for CrossAttention(causal)? If they don't, this line would definitely cause exception since einsum does not support broadcast by default. I have been tokenizing with padding to the same max length for both in order to run my model with performer.
I am confused whether this has to be fixed from repo side, or same size has to be used in order to use causal attention. Thanks in advance for this great repository!