how to solve the issue of overflow - Githubissues

kyegomez / AttentionIsOFFByOne

Implementation of "Attention Is Off By One" by Evan Miller

MIT License

179 stars 9 forks source link

how to solve the issue of overflow #2

Open ZGCTroy opened 1 year ago

ZGCTroy commented 1 year ago

when x_i is large, torch.exp(x_i) will overflow.

In your implementation, x_i = x_i - x_max. So the softmax one equation should be exp(x_i - x_max) / (1 + sum(exp(x_i - x_max)) ?

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Fund with Polar