mobeets / sarsa-rnn

0 stars 0 forks source link

well, somehow it appears to be working #1

Closed mobeets closed 2 years ago

mobeets commented 2 years ago

I remember now that training an RNN with a bootstrapping method like SARSA is potentially a little difficult, because technically you're supposed to take a gradient step every iteration, but then the gradient depends on recent time steps (because the RNN has a hidden state), which means your gradient steps are correlated over time.

Nevertheless, it appears to be...learning?

This is where I take a gradient step after every episode, which is 20 concatenated trials.