thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.83k stars 1.12k forks source link

Adding Hyperparameter Optimisation (HPO) #978

Open bordeauxred opened 11 months ago

bordeauxred commented 11 months ago

Often the result of rl experiments depends greatly on the selected seeds, with a high variance between seeds. The paper proposes as evaluation procedure to define and report disjoint sets of training and evaluation seeds. Each run (of plain rl or hpo+rl) is performed on a set of training seeds and evaluated on the set of test seeds.

A possible implementation strategy is to use hydra for the configuration of the search spaces (on top of the high level interfaces #970). This allows the combination with a) optuna hydra sweepers as well as b) the hpo sweepers from the aforementioned paper. We will contact the authors to integrate the sweepers from their repo which contains sweepers for:

Differential Evolution Hyperband Standard Population Based Training (with warmstarting option) Population Based Bandits (with Mix/Multi versions and warmstarting option) Bayesian-Generational Population Based Training

@MischaPanch

MischaPanch commented 11 months ago

@Trinkle23897 we plan to address it after the high-level interfaces from @opcode81 are merged. If you have any other proposals, would be happy to hear them!

Existing hpo approaches include:

  1. stable-baselines zoo, which is based on pure optuna (not through hydra sweepers) and has a sophisticated module for experiments

  2. NNI: @bordeauxred and I actually tried it and liked it, but it seems that the project is dead or at least stale. It's a shame... There are quite some bugs and documentation issues in the current version, and in case the development indeed came to a halt, it would be better not to rely on it.

Generally, from a quick look it seems like hydra sweepers are an attractive option, b/c they can be implemented on top of other hpo engines. For optuna there already is some support, and in case NNI is resurrected, it would probably be possible to make a new hydra sweeper based on it, if ever needed.

MischaPanch commented 8 months ago

We will do this in (at least) two stages. The first will be a proper test-evaluation protocol for a single params config. @bordeauxred is on it