lucidrains / linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length
MIT License
668 stars 64 forks source link