openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.8k stars 4.88k forks source link

Any pretrained agents for the MuJoCo envs for OpenAI/baselines? #384

Open benelot opened 6 years ago

benelot commented 6 years ago

Hello,

I think I would be very helpful to have agents that are pre-trained on different gym environments. I am working on some transfer learning examples and it could be very helpful to have some baselines. Does anyone have pre-trained agents to experiment with?

ling-pan commented 6 years ago

Agree. I am also looking for the implementations of a2c algorithm in MuJoCo envs. It would be very helpful if someone can share the pre-trained models.

jeremyf21 commented 6 years ago

It also seems like the baselines algorithms are not compatible with action spaces that are not defined by integers (i.e. continuous space in the FetchReach robot)

pzhokhov commented 6 years ago

@jeremyf21 the most of the baselines algorithms are applicable to both continuous and discrete action spaces; but policies (such as CnnPolicy, MlpPolicy etc) themselves may not be. This is partially addressed in this PR: https://github.com/openai/baselines/pull/385/files which makes policies in ppo2 submodule compatible with continuous action space (gym.spaces.Box), and with discrete action space (gym.spaces.Discrete). We have not implemented similar logic for multi-discrete action spaces, but it is coming up. Back to the original question though - I think it's a good idea to publish those if we still have them somewhere, but I am not sure we'll publish them specifically in baselines repo. I'll post here if I find out more.

pzhokhov commented 6 years ago

As a part of the code quality improvement effort, it was decided (putting @joschu in the loop) to include the hyperparameters and document all tips and tricks needed to reproduce state-of-the-art results, but not include the trained models themselves, at least not in the first pass before we figure out good APIs (because as it stands at the moment different policy / model apis make transfer learning experiments difficult).