lucidrains / linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length
MIT License
695 stars 66 forks source link

causal = True #5

Open wajihullahbaig opened 3 years ago

wajihullahbaig commented 3 years ago

Naive question! causal = True , is this used to create a mask that trims/clips the diagonal right half of the attention matrix?

Thank you!

lucidrains commented 3 years ago

@wajihullahbaig yup! in linear attention, you do this with a cumulative sum instead of the triangular mask!

wajihullahbaig commented 3 years ago

@wajihullahbaig yup! in linear attention, you do this with a cumulative sum instead of the triangular mask!

Much appreciated for the reply!

Thanks!