vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

Re-benchmarking refactored algorithms #289

Closed jbuckman closed 1 year ago

jbuckman commented 1 year ago

Problem Description

The performances recorded in the benchmarking scripts (https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Open-RL-Benchmark-0-6-0---Vmlldzo0MDcxOA) no longer correspond to the actual performance of CleanRL algorithms.

Checklist

Steps to Reproduce

python ppo_continuous_action.py --env-id Hopper-v4 --total-timesteps 2000000

This gets far worse performance, and far slower, than the benchmarked example.

vwxyzjn commented 1 year ago

Please see https://wandb.ai/openrlbenchmark/openrlbenchmark/reportlist

there is also a deprecation notice in the link you pasted. It’s also better to see experiment results in the docs, which is more up to date. See https://docs.cleanrl.dev/rl-algorithms/ddpg/#experiment-results as an example.

Please reopen should you have additional questions.