yandexdataschool / Practical_RL

A course in reinforcement learning in the wild
The Unlicense
5.92k stars 1.7k forks source link

issues with gym #5

Closed justheuristic closed 5 years ago

justheuristic commented 7 years ago

If there's something wrong with openai gym and chat didn't resolve it in 10 minutes, feel free to complain here.

justheuristic commented 7 years ago

Contributed by Sergey Kolesnikov:

no GLX on GPU nodes

GLXInfoException: pyglet requires an X server with GLX Solution: re-install GPU drivers without opengl. https://github.com/openai/gym/issues/366 https://davidsanwald.github.io/2016/11/13/building-tensorflow-with-gpu-support.html

justheuristic commented 7 years ago

Game ends in 200 ticks

The current newest version of gym force-stops environment in 200 steps even if you don't use env.monitor. This may ruin CEM on MountainCar and others.(week1 homework) To avoid this, use env = gym.make("MountainCar-v0").env

oscartsai commented 7 years ago

For the bonus part of week 0 homework, what is the expected time to solve "taxi-v1" with a genetic algorithm? The problem is that game ends in 200 ticks and in the end all policies in the pool are stuck at -200. Then I tried not to monitor the is_done flag and tried to monitor when the reward is 20 instead. I found that it would probably take several days to get a score of -100 on my notebook (CPU=i5). So I interrupted the process. I just want to check is it normal to take such a long time?

Below is what I got in 8 hours (I set t_max = 11000). Epoch 0: best score: -44656.76 Epoch 1: best score: -42678.38 Epoch 2: best score: -42678.56 Epoch 3: best score: -32778.74 Epoch 4: best score: -32778.74 Epoch 5: best score: -44657.84 Epoch 6: best score: -42678.56 Epoch 7: best score: -38718.74 Epoch 8: best score: -38717.12 Epoch 9: best score: -38718.74 Epoch 10: best score: -36738.02 Epoch 11: best score: -32777.84 Epoch 12: best score: -38717.12 Epoch 13: best score: -36738.02 Epoch 14: best score: -32779.1 Epoch 15: best score: -28819.28 Epoch 16: best score: -30799.28 Epoch 17: best score: -30799.64 Epoch 18: best score: -32779.28 Epoch 19: best score: -30800.0 Epoch 20: best score: -32779.1 Epoch 21: best score: -30799.1 Epoch 22: best score: -32777.48 Epoch 23: best score: -32778.38 Epoch 24: best score: -30799.64

justheuristic commented 5 years ago

For the record, we fixed it by removing time limit (env = gym.make('Taxi-v1').env)