kyegomez / Hedgehog

Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"

https://discord.gg/GYbXvDGevY

MIT License

11 stars 0 forks source link

ai attention attention-is-all-you-need attention-mechanisms feedforward ffns ml mlps multi-modal neural-nets open-source opensource-ai softmax

readme

HedgeHog

Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry". This paper implements MLPs to mimic the softmax of a transformer. Suppodesly hits SOTA on wikitext for sub quadratic models. I've too been thinking about replacing softmax with MLPs. This past month we saw doezens of papers on mamba and convolutions but MLPs might have undiscovered powers.

License

MIT