tspooner / rsrl

A fast, safe and easy to use reinforcement learning framework in Rust.
https://crates.io/crates/rsrl
MIT License
179 stars 13 forks source link

Persistent Advantage Learning #30

Closed tspooner closed 6 years ago

tspooner commented 6 years ago

This PR includes one new agent to the control::td package. It is based on the work of Bellemare et al. (https://arxiv.org/pdf/1512.04860.pdf) to develop new operators that prevent divergence exhibit, in some cases, with the conventional Bellman operator.

One example script is provided and performance appears to be slightly better than that of conventional Q-Learning.

Note: more variants of this algorithm could indeed be added. We leave these to the interested parties to implement themselves, or until a later time when it is more appropriate.