implemented mup - Githubissues

lucidrains / linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length

MIT License

695 stars 66 forks source link

Closed thomasfortin1 closed 6 months ago

thomasfortin1 commented 6 months ago

Oops, meant to make this for my own fork of the repo. Closed