vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.54k stars 631 forks source link

Prototype TD3 with JAX #216

Closed vwxyzjn closed 2 years ago

vwxyzjn commented 2 years ago

Problem Description

Given the incredible performance of the DDPG + JAX prototype (#187), it's worth prototyping TD3 + JAX as well. @joaogui1 is super experienced with JAX and has expressed interest in working on this. Thanks @joaogui1 for your interest! This issue tracks the development effort.

I suggest extending the DDPG prototype link to work with TD3. Here is a couple of additional resources:

  1. CleanRL's DDPG docs: https://docs.cleanrl.dev/rl-algorithms/ddpg/
  2. CleanRL's TD3 docs: https://docs.cleanrl.dev/rl-algorithms/td3/

To see exactly how CleanRL's DDPG differs from TD3, a filediff between ddpg_continuous_action.py and td3_continuous_action.py would explicitly show the code differences:

image

There is a contribution checklist to help with making the contribution when making the PR. See https://github.com/vwxyzjn/cleanrl/pull/186 as an example.

Thanks again @joaogui1 and let me know if you run into any issues!

vwxyzjn commented 2 years ago

Closed by #219