utiasDSL / safe-control-gym

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
https://www.dynsyslab.org/safe-robot-learning/
MIT License
637 stars 132 forks source link

Hyperparameter Optimization Module #164

Open middleyuan opened 2 months ago

middleyuan commented 2 months ago

This PR is created for two primary purposes:

  1. Include the state-of-the-art package Google-Vizier as a hyperparameter optimization solver.
  2. Change the database to SQLite for easier usage.
adamhall commented 2 months ago

Looks much cleaner! Nice! I think the samplers could still be made a little more flexible? Let me know if you think this is feasible.

middleyuan commented 1 month ago

Looks pretty good! I have a couple thoughts though:

  1. It seems like there is a lot of controller-specific code. For example, there is a separate sampler for each controller, and in the HPO code, there are if statements depending on the algorithm being optimized. I'm wondering if there is a way to make this more generic by making an HPO yaml more complex and then having the underlying code make classes from the arguments in these yamls? For example, I think that hpo_sampler.py could almost entirely be defined in a yaml and then having a generic class for sampler that parses the yaml appropriately? It feels like there is a lot of repeated code that could be simplified and the addition of future algos simpler?
  2. the HPO class is defined in both hpo_optuna.py and hpo_vizier.py which seem to share a lot of code. I think there should really be a parent HPO class, and then child sublcasses for the different use cases?
  3. I'm not totally sure why the files are being removed from the examples. Is it because you are replacing them with better hyper parameters?
  1. I have made the HPO code more generic. The reason I don't define hyperparameter search space in yaml is that I don't want to add burdens on users as it usually requires some knowledge for code and algorithms.
  2. Yes, corresponding changes are made.
  3. Changes include re-factoring to make the folder structure consistent with other examples.

General comments: to get HPO module fully tested, I am waiting for another PR (quadrotor interface) to be approved. After that I will run unit-test for HPO on new env interface and also run pre-commits hook.