vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

Brax + PPO integration #313

Open vwxyzjn opened 1 year ago

vwxyzjn commented 1 year ago

Description

Test out integration with brax. It seems to work out of the box without having to implement observation normalization — https://wandb.ai/costa-huang/cleanRL/runs/2aemjwey?workspace=user-costa-huang

image

Compilation takes ~400 seconds, and getting 6000 rewards in Ant takes about 100 seconds with GPU. In comparison, the official demo takes 30 seconds to compile and about 80 seconds to reach ~8000 rewards (using TPU I presume). Our compilation time takes significantly longer, most likely because we didn't use lax.scan or jax.foriloop, but once the compilation finished the SPS is about 600k.

CC @joaogui1

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Nov 6, 2022 at 9:28PM (UTC)
Surya-77 commented 6 months ago

Hi @vwxyzjn ,

I hope you're doing well. I was reviewing the PR for the ( Brax + PPO integration #313 ) and noticed that it's currently closed. I wanted to check in with you to see if there have been any difficulties in merging this change into the main repository. Additionally, is there an updated version of this integration available that addresses any issues or incorporates new changes? Looking forward to your response.

Best regards, Surya