kyegomez / Hedgehog

Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"
https://discord.gg/GYbXvDGevY
MIT License
11 stars 0 forks source link
ai attention attention-is-all-you-need attention-mechanisms feedforward ffns ml mlps multi-modal neural-nets open-source opensource-ai softmax

Multi-Modality

HedgeHog

Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry". This paper implements MLPs to mimic the softmax of a transformer. Suppodesly hits SOTA on wikitext for sub quadratic models. I've too been thinking about replacing softmax with MLPs. This past month we saw doezens of papers on mamba and convolutions but MLPs might have undiscovered powers.

License

MIT