Closed szrlee closed 1 year ago
I suggest that you refer to this paper: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. I do have a plan add the algorithm to this platform. And I've already made it worked, but the code is not compatible with the current platform. It'll still take some time to adjust.
I suggest that you refer to this paper: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. I do have a plan add the algorithm to this platform. And I've already made it worked, but the code is not compatible with the current platform. It'll still take some time to adjust.
Well, thanks @fengredrum Actually, I am trying to make our algorithms "DAPO" in this paper originally implemented in our "memorie" distributed framework to be compatible with Tianshou. DAPO relies on the behaviour policy's probability $\pi_{old}(a_t|s_t)$ (necessary), multi-step bootstrapping (necessary) and V-trace (not necessary but it contains previous two features). Therefore, if V-trace is supported, it is easier for me to reimplement DAPO. Last but not least, DAPO has been proved to have superior performance than IMPALA (which is (one-step) entropy augmentation in DAPO paper) through empirical study.
Closing as stale (and lacking description)