kyegomez / xLSTM

Implementation of xLSTM in Pytorch from the paper: "xLSTM: Extended Long Short-Term Memory"
MIT License
93 stars 7 forks source link

tanh(C_t) is not in the final paper #2

Closed twoletters closed 2 weeks ago

twoletters commented 4 months ago

https://github.com/kyegomez/xLSTM/blob/020209fd7c156852a12a82d1bb21ce4a11309fc0/xlstm_torch/main.py#L96

On this line, c_hat is passed through tanh, but in the final paper, h_tilda (a.k.a. c_hat here) is only c_t / n_t, so h_t should be:

h_t = self.sigmoid(o_tilda) * c_hat

Excerpt from Section A.2 of the final paper:

Here, the cell input activation function φ is tanh, the hidden state activation function is the identity. φ helps stabilizing the recurrence.

Upvote & Fund

Fund with Polar

github-actions[bot] commented 3 weeks ago

Stale issue message