Open charlesxu90 opened 2 years ago
@charlesxu90 yea it works
@lucidrains Thanks for answering me. Really appreciated!
Causal self-attention requires a triangle-like attention mask to mask out future tokens. In this code, I did find the interface you left for input_mask.
However, I didn't find the spot you initiate the attention mask. That's something confused me.
Thanks.
Found out that Efficient attention doesn't work on Causal attention scenario as mentioned here. https://github.com/cmsflash/efficient-attention/issues/4
So I doubt if the causal in this code really works?