JAX Integration with CleanRL

Problem Description

Given the incredible performance of the DDPG + JAX prototype (https://github.com/vwxyzjn/cleanrl/pull/187), it's worth prototyping JAX with other algorithms as well! This issue tracks the overall progress of integrating JAX with CleanRL.

Useful resources

(a working JAX + DDPG example as a reference implementation) https://github.com/vwxyzjn/cleanrl/pull/187
- CleanRL's DDPG docs: https://docs.cleanrl.dev/rl-algorithms/ddpg/
(a working JAX + PPO example as a reference implementation) #217
- CleanRL's PPO docs: https://docs.cleanrl.dev/rl-algorithms/ppo/

Common gotchas and errors:

Useful pattern when extending

In CleanRL a filediff is incredibly helpful. For example, if I want to learn how TD3 is different from DDPG, I could do

open VS code and select ddpg_continuous_action.py and td3_continuous_action.py
right-click and left-click "compare selected"
the following file diff window shows up

Contribution process

There is a contribution checklist to help streamline the contribution process. For each new contribution, we'd need to add documentation, tests, run benchmark experiments, etc. See https://github.com/vwxyzjn/cleanrl/pull/186 as an example.

Tracked issues

[x] #216
[ ] #217
[x] #219
[x] #220
[x] #221

vwxyzjn / cleanrl