Open yonigottesman opened 1 year ago
The problem arises in chapter:
I think the softmax is calculated on z_t, and i is taken on the result of the softmax. not like in the book where the softmax is calculated on a single logit t,i which doesnt make sense, softmax needs all the "i"s
The formula should look like this:
The problem arises in chapter:
Describe the bug
Expected behavior
The formula should look like this:![image](https://user-images.githubusercontent.com/4004127/224647382-113a6ba7-bf37-4fab-ac2c-c7595f1c2056.png)