Closed rl-2 closed 4 years ago
Maybe you didn't set TimeLimit around the env? gym.make() adds a lot of these wrappers automatically that you don't get with a custom class.
Quick check would be to use gym.make()
inside your custom env to create the MountainCar inner env, which should then behave identically.
Thanks, @ericl ! That's exactly the problem.
What is the problem?
Ray version: 0.7.5 Gym version: 0.17.0 Python version: 3.6.10 TensorFlow version: 1.10.0 OS: macOS Mojave 10.14.6
I used MountainCarContinuous-v0 as a customized environment but got different results than using it as an internal environment. The environment was not modified at all and the hyper-parameters are taken from
mountaincarcontinuous-ddpg.yaml
in the tuned examples. The policy can still be trained. The detailed results are below:Use as internal environment:
Use as customized environment: