Closed gth828r closed 3 years ago
If we want to use OpenAI Gym, there are useful instructions on running that in Colab over at https://colab.research.google.com/drive/18LdlDDT87eb8cCTHZsXyS9ksQPzL3i6H
Mujoco appears to have a 30 day trial followed by a $500 personal license. It also sounds like the licensing would make it hard to use with Colab: https://github.com/openai/mujoco-py/issues/358
Rather than jumping ship, let's steer towards simpler tools. Probably an OpenAI gym environment.
Since we are trying to validate our model, it is probably simplest to start off with a simple environment such as cartpole.
Cartpole will let us validate base functionality, but it won't be enough to actually test CQL's ability to generalize from a distribution learned offline. We will verify that nothing is horribly broken with Cartpole, but we'll need to graduate beyond that at some point.
Upon further research, it doesn't make a ton of sense to try to apply CQL to PPO. It makes more sense to try to apply it to SAC, which is focused on offline RL. Porting this issue over to https://github.com/gth828r/sac-cql/issues/2.
We need a simulation environment to test with. Something like an environment from AI Gym. Choose one to explore the capabilities of PPO combined with CQL.