vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.66k stars 641 forks source link

Contributing PPO + Transformer-XL #442

Open MarcoMeter opened 10 months ago

MarcoMeter commented 10 months ago

Hey @vwxyzjn it's been quite a few extremely busy months, but now, I finally have the capacity to contribute a single file implementation of PPO with Transformer-XL as episodic memory. The implementation would be based on my repo. Concerning benchmark, I would like to use Memory Gym (Code, Paper).

If you are interested, I'll get started soon.

vwxyzjn commented 10 months ago

Hey @MarcoMeter, this is pretty cool! Sorry it took me a while to get back to you. Do you want to make it a bit more self-contained, like creating a cleanrl/ppo_trxl/ppo_trxl.py? You can create the dependencies in cleanrl/ppo_trxl/pyproject.toml cleanrl/ppo_trxl/poetry.lock.

The main thing we are looking for a succinct, understandable implementations, benchmarks, and docs(see how we documented https://docs.cleanrl.dev/rl-algorithms/dqn/#dqn_ataripy as an example).

MarcoMeter commented 9 months ago

Work is still in progress, so stay tuned ;) https://github.com/MarcoMeter/episodic-transformer-memory-ppo/blob/cleanrl/train.py I'll open a PR once ready.

MarcoMeter commented 7 months ago

@vwxyzjn I finally resolved the issue. Only one linear layer was supposed to be between the CNN and the transformer blocks. Quiet surprising that this additional layer hampered performance a lot.