R2D2 using Ray - Githubissues

mobeets / q-rnn

0 stars 0 forks source link

R2D2 using Ray #13

Closed mobeets closed 1 year ago

mobeets commented 1 year ago

https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#dqn

So, just a reminder of what R2D2 can offer:

better handling of RNN hidden states (e.g., burn-in)
improvements used in Rainbow (specifically: double Q learning, dueling networks, prioritized replay, and multi-step learning)

Downsides of Ray's implementation is that it uses RNN or LSTM but not GRU? Not sure how difficult that might be to modify.

mobeets commented 1 year ago

Okay so there are three features we need to use R2D2:

custom envs (no prob)
previous action/reward in observation
GRU instead of LSTM

For the GRU part, this is doable but may require a little fiddling with. See here, which points to the example code here.

For the previous action/reward part, this may be part of their "wrappers," but I also saw it mentioned as part of the LSTM add-on, so it's definitely possible.