lucidrains / rela-transformer

Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
MIT License
49 stars 7 forks source link