hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

[question] Suggested Hyperparams for A2C with highway-env #1100

Open pierrekhouryy opened 3 years ago

pierrekhouryy commented 3 years ago

Can anyone suggest hyperparams for A2C when training with the parking-env from the highway-env repo?

Miffyli commented 3 years ago

This place is not really for asking help with custom environments (these issues are designed for bugs and enhancements). Your best bet is to check projects using that environment and see what parameters they use, or use rl-zoo to optimize the parameters yourself. Sadly hyperparameter search is one of the less-fun parts of RL work.

araffin commented 3 years ago

Hello, Why would you use A2C on such environment instead of using HER+SAC which is much more appropriate? (and can reach 90% success in 20000 timesteps, see zoo https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/her.yml)

pierrekhouryy commented 3 years ago

Hello @araffin , HER+SAC is in fact much more appropriate, I am however more interested in studying the effect of using A2C with the parking environment. Thanks for your help

StarBaseOne commented 3 years ago

Hello @araffin , HER+SAC is in fact much more appropriate, I am however more interested in studying the effect of using A2C with the parking environment. Thanks for your help

Hi Do you have A2C working on that environment? I ask because Parking-Env in that custom environment is a goal orientated continuous action space and the Stable Baselines A2C implementation won't work on that environment out of the box, as it raises a "NotImplementedError: Error: the model does not support input space of type Dict" in other words (summon-v0 & parking-v0) only work with the HER wrapper from SB/SB3 that can process the dictionary of continuous actions.

If you run SAC,TD3, DDPG, TQC on the parking-v0 environment they won't work, if you however combine them with HER which is designed to work in such environment it will work fine out of the box.

If you have somehow got it working, could you share it? I can use my high-spec machine to run the optuna hyperparameter tuning for you on those environments. It works fairly well (see below for A2C on Intersection-v0 environment from that repo) image