idiap / fast-transformers

Pytorch library for fast transformer implementations
1.65k stars 179 forks source link

Casual attention is cheating by looking in the future #111

Closed jogardi closed 2 years ago

jogardi commented 2 years ago

On line 95 of casual_linear_attention.py there is a normalizing constant computed by looking at all queries and keys. This creates a dependency on future values which would be unknown in a real prediction task.

angeloskath commented 2 years ago

Hi,

This is not the case. Since we compute the normalizing constant using the cumsum of K, we are actually using information up to the current time step and not from the future.

Best, Angelos