Closed yufengwhy closed 4 years ago
In other words, the forward is like rnn, while the backward is like Markov model.
If you are referring the lack of BPTT, check out issue #22 and the updated readme. The tl;dr version is that it doesn't improve performance on real life session data in experiments so was not included in the public version. Might be added later for folks working with longer sequences.
The code only define a timestep of rnn, so gradient of timestep >=(t+2) cannot be propogated to timestep t ? which is more like a Markov Assumption (only timestep t and timestep t+1 have relations) rather than a rnn Assumption?