vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.54k stars 631 forks source link

Multi-objective hyperparameter optimization (DRAFT) #269

Open vwxyzjn opened 2 years ago

vwxyzjn commented 2 years ago

Description

This PR closes #265.

Had some preliminary results w/ multi-objective stuff, as shown in the following figure. The x-axis is the normalized score of CartPole-v1 and Acrobat-v1, and the y-axis is the average runtime (in seconds).

Screen Shot 2022-08-28 at 6 43 54 PM

We can see the Pareto Front highlighted in red, so we can pick a set of hyperparameters that achieves high normalized scores while remaining fast.

Types of changes

Checklist:

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.

vercel[bot] commented 2 years ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Aug 28, 2022 at 10:48PM (UTC)