vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.54k stars 631 forks source link

Added TQC #262

Closed AdityaGudimella closed 10 months ago

AdityaGudimella commented 2 years ago

Description

Closes #258. Implement Truncated Quantile Critics

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 2 years ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Aug 26, 2022 at 4:56PM (UTC)
AdityaGudimella commented 2 years ago

I can run the algo against the same mujoco envs as run in the paper. Would once against each env be sufficient? Also how do I share the results of the run with you?

vwxyzjn commented 2 years ago

Thank you @AdityaGudimella. The variant looks good. I suggest running some preliminary experiments in your own wandb namespace and create a wandb report to share the findings.

AdityaGudimella commented 2 years ago

Apologies for the delay in this. I ran 2 trails each of Hopper-v3, Humanoid-v3 and Swimmer-v3 and didn't have any resources available to run experiments after that. I've just set 2 trials each of HalfCheetah-v3, Ant-v3 and Walker2d-v3 now. Once those experiments are done (probably in 2 days) I will share the report here.

vwxyzjn commented 1 year ago

That sounds good. Feel free to ping me if this is ready for review or if there is anything I can help.