Closed twoletters closed 2 weeks ago
https://github.com/kyegomez/xLSTM/blob/020209fd7c156852a12a82d1bb21ce4a11309fc0/xlstm_torch/main.py#L96
On this line, c_hat is passed through tanh, but in the final paper, h_tilda (a.k.a. c_hat here) is only c_t / n_t, so h_t should be:
c_hat
tanh
h_tilda
c_t / n_t
h_t
h_t = self.sigmoid(o_tilda) * c_hat
Excerpt from Section A.2 of the final paper:
Here, the cell input activation function φ is tanh, the hidden state activation function is the identity. φ helps stabilizing the recurrence.
Stale issue message
https://github.com/kyegomez/xLSTM/blob/020209fd7c156852a12a82d1bb21ce4a11309fc0/xlstm_torch/main.py#L96
On this line,
c_hat
is passed throughtanh
, but in the final paper,h_tilda
(a.k.a.c_hat
here) is onlyc_t / n_t
, soh_t
should be:Excerpt from Section A.2 of the final paper:
Upvote & Fund