Open MarcoMeter opened 10 months ago
Hey @MarcoMeter, this is pretty cool! Sorry it took me a while to get back to you. Do you want to make it a bit more self-contained, like creating a cleanrl/ppo_trxl/ppo_trxl.py
? You can create the dependencies in cleanrl/ppo_trxl/pyproject.toml
cleanrl/ppo_trxl/poetry.lock
.
The main thing we are looking for a succinct, understandable implementations, benchmarks, and docs(see how we documented https://docs.cleanrl.dev/rl-algorithms/dqn/#dqn_ataripy as an example).
Work is still in progress, so stay tuned ;) https://github.com/MarcoMeter/episodic-transformer-memory-ppo/blob/cleanrl/train.py I'll open a PR once ready.
@vwxyzjn I finally resolved the issue. Only one linear layer was supposed to be between the CNN and the transformer blocks. Quiet surprising that this additional layer hampered performance a lot.
Hey @vwxyzjn it's been quite a few extremely busy months, but now, I finally have the capacity to contribute a single file implementation of PPO with Transformer-XL as episodic memory. The implementation would be based on my repo. Concerning benchmark, I would like to use Memory Gym (Code, Paper).
If you are interested, I'll get started soon.