quadratic_linear_attn implementation

kyegomez / Hedgehog

Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"

https://discord.gg/GYbXvDGevY

MIT License

11 stars 0 forks source link

quadratic_linear_attn implementation #4

Open Kiet0712 opened 2 months ago

Kiet0712 commented 2 months ago

I think you should put an epsilon in denominator of output of quadratic_linear_attn function to prevent NaN value when training HedgeHog MLP. qk / (qk.sum(dim=-1, keepdim=True) +epsilon)

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

github-actions[bot] commented 2 months ago

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

kyegomez commented 2 months ago

@Kiet0712 if you could open up a pr that would be nice :)