vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.54k stars 631 forks source link

DQN + Atari + JAX #231

Closed yooceii closed 2 years ago

yooceii commented 2 years ago

Description

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 2 years ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Jul 26, 2022 at 6:09AM (UTC)
yooceii commented 2 years ago

image

JAX gains roughly 40% SPS increase with same epi_ret performance.

Full report here: https://wandb.ai/yooceii/dqn-atari-jax/reports/DQN-JAX--VmlldzoyMjkzMTM2

yooceii commented 2 years ago

image Looks like combining linear_schedule and select_action and jitting do get a little better performance. Now it gains roughly 50% more SPS. @vwxyzjn Wandb report is also updated.

vwxyzjn commented 2 years ago

This is awesome work. Thank you! @kinalmehta would you mind including the jitted action sampling function to #222? I think this will be the last thing before we merge. Since this is a non-breaking change for #222, we don't need to re-run the benchmark.

vwxyzjn commented 2 years ago

Closed in favor of https://github.com/vwxyzjn/cleanrl/pull/222