V-trace support? - Githubissues

szrlee commented 4 years ago

[x] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [x] new feature request
[x] I have visited the source website, and in particular read the known issues
[x] I have searched through the issue tracker for duplicates
[ ] I have mentioned version numbers, operating system and environment, where applicable:
```
import tianshou, sys
print(tianshou.__version__, sys.version, sys.platform)
```

fengredrum commented 4 years ago

I suggest that you refer to this paper: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. I do have a plan add the algorithm to this platform. And I've already made it worked, but the code is not compatible with the current platform. It'll still take some time to adjust.

szrlee commented 4 years ago

I suggest that you refer to this paper: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. I do have a plan add the algorithm to this platform. And I've already made it worked, but the code is not compatible with the current platform. It'll still take some time to adjust.

Well, thanks @fengredrum Actually, I am trying to make our algorithms "DAPO" in this paper originally implemented in our "memorie" distributed framework to be compatible with Tianshou. DAPO relies on the behaviour policy's probability $\pi_{old}(a_t|s_t)$ (necessary), multi-step bootstrapping (necessary) and V-trace (not necessary but it contains previous two features). Therefore, if V-trace is supported, it is easier for me to reimplement DAPO. Last but not least, DAPO has been proved to have superior performance than IMPALA (which is (one-step) entropy augmentation in DAPO paper) through empirical study.

MischaPanch commented 1 year ago

Closing as stale (and lacking description)

thu-ml / tianshou

V-trace support? #17