vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

http://docs.cleanrl.dev

Other

4.95k stars 569 forks source link

Multi-objective hyperparameter optimization #265

Open vwxyzjn opened 1 year ago

vwxyzjn commented 1 year ago

Overview

228 prototyped a great initial integration with optuna to do hyperparameter optimization. However, it has a couple of downsides:

lack of support for tuning multiple environments when not knowing their reward scales
can't do multi-objective optimization such as maximizing return but minimizing runtime or steps.

I believe incorporating the multi-objective optimization API from optuna could offer an elegant solution to these issues. See https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/002_multi_objective.html.

vwxyzjn commented 1 year ago

CC @braham-snyder we are tracking the progress w/ developing multi-objective hyperparameter optimization here. I think a first prototype is to support maximizing normalized scores while minimizing runtime. Let me know if you or anyone else who are interested in working on this, we always welcome new contributors :)

braham-snyder commented 1 year ago

Thanks -- it's unlikely I'll have anything contributable here, but if I were to I'd definitely let you know

vwxyzjn commented 1 year ago

Had some preliminary results w/ multi-objective stuff, as shown in the following figure. The x-axis is the normalized score of CartPole-v1 and Acrobat-v1, and the y-axis is the average runtime (in seconds).

We can see the Pareto Front highlighted in red, so we can pick a set of hyperparameters that achieves high normalized scores while remaining fast.