Open ZGCTroy opened 1 year ago
when x_i is large, torch.exp(x_i) will overflow.
In your implementation, x_i = x_i - x_max. So the softmax one equation should be exp(x_i - x_max) / (1 + sum(exp(x_i - x_max)) ?
when x_i is large, torch.exp(x_i) will overflow.
In your implementation, x_i = x_i - x_max. So the softmax one equation should be exp(x_i - x_max) / (1 + sum(exp(x_i - x_max)) ?
Upvote & Fund