gth828r / ppo

An implementation of PPO in Tensorflow
0 stars 0 forks source link

Choose an environment to test with #2

Closed gth828r closed 3 years ago

gth828r commented 3 years ago

We need a simulation environment to test with. Something like an environment from AI Gym. Choose one to explore the capabilities of PPO combined with CQL.

gth828r commented 3 years ago

If we want to use OpenAI Gym, there are useful instructions on running that in Colab over at https://colab.research.google.com/drive/18LdlDDT87eb8cCTHZsXyS9ksQPzL3i6H

Mujoco appears to have a 30 day trial followed by a $500 personal license. It also sounds like the licensing would make it hard to use with Colab: https://github.com/openai/mujoco-py/issues/358

Rather than jumping ship, let's steer towards simpler tools. Probably an OpenAI gym environment.

gth828r commented 3 years ago

Since we are trying to validate our model, it is probably simplest to start off with a simple environment such as cartpole.

gth828r commented 3 years ago

Cartpole will let us validate base functionality, but it won't be enough to actually test CQL's ability to generalize from a distribution learned offline. We will verify that nothing is horribly broken with Cartpole, but we'll need to graduate beyond that at some point.

gth828r commented 3 years ago

Upon further research, it doesn't make a ton of sense to try to apply CQL to PPO. It makes more sense to try to apply it to SAC, which is focused on offline RL. Porting this issue over to https://github.com/gth828r/sac-cql/issues/2.