kyegomez / Hedgehog

Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"
https://discord.gg/GYbXvDGevY
MIT License
11 stars 0 forks source link

quadratic_linear_attn implementation #4

Open Kiet0712 opened 2 months ago

Kiet0712 commented 2 months ago

I think you should put an epsilon in denominator of output of quadratic_linear_attn function to prevent NaN value when training HedgeHog MLP. qk / (qk.sum(dim=-1, keepdim=True) +epsilon)

Upvote & Fund

Fund with Polar

github-actions[bot] commented 2 months ago

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

kyegomez commented 2 months ago

@Kiet0712 if you could open up a pr that would be nice :)