lucidrains / linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length
MIT License
695 stars 66 forks source link

implemented mup #20

Closed thomasfortin1 closed 6 months ago

thomasfortin1 commented 6 months ago

Oops, meant to make this for my own fork of the repo. Closed