Open wajihullahbaig opened 3 years ago
@wajihullahbaig yup! in linear attention, you do this with a cumulative sum instead of the triangular mask!
@wajihullahbaig yup! in linear attention, you do this with a cumulative sum instead of the triangular mask!
Much appreciated for the reply!
Thanks!
Naive question! causal = True , is this used to create a mask that trims/clips the diagonal right half of the attention matrix?
Thank you!