thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
8.02k stars 1.13k forks source link

V-trace support? #17

Closed szrlee closed 1 year ago

szrlee commented 4 years ago
fengredrum commented 4 years ago

I suggest that you refer to this paper: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. I do have a plan add the algorithm to this platform. And I've already made it worked, but the code is not compatible with the current platform. It'll still take some time to adjust.

szrlee commented 4 years ago

I suggest that you refer to this paper: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. I do have a plan add the algorithm to this platform. And I've already made it worked, but the code is not compatible with the current platform. It'll still take some time to adjust.

Well, thanks @fengredrum Actually, I am trying to make our algorithms "DAPO" in this paper originally implemented in our "memorie" distributed framework to be compatible with Tianshou. DAPO relies on the behaviour policy's probability $\pi_{old}(a_t|s_t)$ (necessary), multi-step bootstrapping (necessary) and V-trace (not necessary but it contains previous two features). Therefore, if V-trace is supported, it is easier for me to reimplement DAPO. Last but not least, DAPO has been proved to have superior performance than IMPALA (which is (one-step) entropy augmentation in DAPO paper) through empirical study.

MischaPanch commented 1 year ago

Closing as stale (and lacking description)