Closed mobeets closed 1 year ago
Okay so there are three features we need to use R2D2:
For the GRU part, this is doable but may require a little fiddling with. See here, which points to the example code here.
For the previous action/reward part, this may be part of their "wrappers," but I also saw it mentioned as part of the LSTM add-on, so it's definitely possible.
https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#dqn
So, just a reminder of what R2D2 can offer:
Downsides of Ray's implementation is that it uses RNN or LSTM but not GRU? Not sure how difficult that might be to modify.