vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.41k stars 616 forks source link

Add gymnasium support for TD3 #377

Closed pseudo-rnd-thoughts closed 12 months ago

pseudo-rnd-thoughts commented 1 year ago

Description

Types of changes

Checklist:

If you need to run benchmark experiments for a performance-impacting changes:

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 9, 2023 2:57pm
vwxyzjn commented 1 year ago

Hi @pseudo-rnd-thoughts thanks for the PR! Could you do a filediff between td3 variants and ddpg variants to minimize the lines of code differences? You can select two files in vscode, right click them, and select "compare selected", which should produce something like below.

image

pseudo-rnd-thoughts commented 1 year ago

Hi @pseudo-rnd-thoughts thanks for the PR! Could you do a filediff between td3 variants and ddpg variants to minimize the lines of code differences? You can select two files in vscode, right click them, and select "compare selected", which should produce something like below.

Done in fed8aaf09c880e7e8df9a517a850c51cede99a29 The differences between DDPG and TD3 should be solely due to the heuristic improvements. The clipping part is relatively confusing and unclear but that could be looked at in another PR

sdpkjc commented 12 months ago

👋Hi there,

I've noticed that this PR hasn't had any activity for several months and it now only requires some clean-up work and conflict resolution to complete. I would like to volunteer to take over and finish off this work if there's no objection. Your efforts on this so far have been much appreciated, and I look forward to the possibility of being able to contribute to its completion.

vwxyzjn commented 12 months ago

@sdpkjc feel free to. Thanks so much!

sdpkjc commented 12 months ago

pr-377 pr-377-time

sdpkjc commented 12 months ago

pr-377-jax pr-377-jax-time

sdpkjc commented 12 months ago

I've already run the RLops process and everything is performing as expected. If there are no other issues, I will proceed to merge this branch.

vwxyzjn commented 12 months ago

LGTM! Feel free to merge.

pseudo-rnd-thoughts commented 12 months ago

@sdpkjc Apologies for not finishing this and thanks for doing the rest of it