vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.4k stars 616 forks source link

Add PPO + Transformer-XL #459

Closed MarcoMeter closed 2 weeks ago

MarcoMeter commented 5 months ago

Description

Implementation of PPO with Transformer-XL as episodic memory. Based on this repo and paper.

Types of changes

Checklist:

If you need to run benchmark experiments for a performance-impacting changes:

vercel[bot] commented 5 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 18, 2024 4:49am
MarcoMeter commented 5 months ago

pre-commit

pre-commit fails because of two "obsolet" imports: memory_gym and PoMEnv. Without those imports, the environments are not registered inside gymnasium.

enjoy.py

I added a script to load a trained model and then watch an episode.

ProofofMemory-v0 and MiniGrid-MemoryS9-v0

These environments require memory and converge pretty fast. That's why I included those initially. MemoryGym environments take in more time and resources (especially GPU memory due to the cached hidden states of Transformer-XL).

TODO

I still have to run the benchmarks and write documentation. Besides that, the single file implementation is basically done. I tried to stay close to ppo_atari_lstm.py

roger-creus commented 2 months ago

Hey! This looks pretty impressive! Just curious, what is the state of this PR?

MarcoMeter commented 2 months ago

Hi @roger-creus the benchmarks just completed. So the next step is to prepare the reports and then to write the docs.

roger-creus commented 2 months ago

Nice! Looking forward to the results

MarcoMeter commented 2 months ago

It reproduces the results of my paper: https://arxiv.org/abs/2309.17207

and this is the original implementation: https://github.com/MarcoMeter/neroRL

roger-creus commented 2 months ago

I'm curious about how it performs in other environments (e.g. atari?)

MarcoMeter commented 3 weeks ago

IMHO, here are the remaining TODOs of this PR:

@roger-creus I don't have results on Atari.

vwxyzjn commented 2 weeks ago

Keep or remove the Proof of Memory environment (cleanrl/ppo_trxl/pom_env.py)?

Feel free to keep it.

Do you know why the wandb chart looks like this?

image
MarcoMeter commented 2 weeks ago

Do you know why the wandb chart looks like this?

image

What are you referring to? This is how I created the report:

@echo off
python -m openrlbenchmark.rlops ^
    --filters "?we=openrlbenchmark&wpn=cleanRL&ceik=env_id&cen=exp_name&metric=episode/r_mean" ^
    "ppo_trxl?cl=PPO-TrXL" ^
    --env-ids MortarMayhem-Grid-v0 MortarMayhem-v0 Endless-MortarMayhem-v0 MysteryPath-Grid-v0 MysteryPath-v0 Endless-MysteryPath-v0 SearingSpotlights-v0 Endless-SearingSpotlights-v0 ^
    --no-check-empty-runs ^
    --pc.ncols 3 ^
    --pc.ncols-legend 3 ^
    --rliable ^
    --rc.score_normalization_method maxmin ^
    --rc.normalized_score_threshold 1.0 ^
    --rc.sample_efficiency_plots ^
    --rc.sample_efficiency_and_walltime_efficiency_method Median ^
    --rc.performance_profile_plots ^
    --rc.aggregate_metrics_plots ^
    --rc.sample_efficiency_num_bootstrap_reps 10 ^
    --rc.performance_profile_num_bootstrap_reps 10 ^
    --rc.interval_estimates_num_bootstrap_reps 10 ^
    --output-filename memgym/compare ^
    --scan-history ^
    --report

Thanks for your feedback =)

vwxyzjn commented 2 weeks ago

Oh I meant the error bar (shadow region) is very large for some reason, but it’s fine. I have added you to the list of contributors. Feel free to merge after CI passes.

MarcoMeter commented 2 weeks ago

It seems that other reports have this as well, like: https://wandb.ai/openrlbenchmark/cleanrl/reports/CleanRL-PPG-vs-PPO-results--VmlldzoyMDY2NzQ5

MarcoMeter commented 2 weeks ago

I did some refinements:

My last step before merging is to make sure that poetry and the dependencies blend well.

MarcoMeter commented 2 weeks ago

My last step before merging is to make sure that poetry and the dependencies blend well.

Done.