jinzishuai / learn2deeplearn

A repository of codes in learning deep learning
GNU General Public License v3.0
13 stars 1 forks source link

Why does the human chosen policy work so poorly on the FrozenLake problem #40

Closed jinzishuai closed 6 years ago

jinzishuai commented 6 years ago

https://github.com/jinzishuai/learn2deeplearn/tree/master/learnRL/OpenAIGym/FrozenLake

For example, the one I chose as image using the first choice only worked for around 50 out of 1000 episods while the results of a qtable learning works at about 70% success rate.

jinzishuai commented 6 years ago

Study Source Code

https://github.com/openai/gym/blob/master/gym/envs/toy_text/frozen_lake.py

jinzishuai commented 6 years ago

How to test without "slippery"?

Anser: https://github.com/openai/gym/issues/565

With Slippery Off, My Policy Works 100%

No surprisingly, my policy works all the time since this becomes deterministic.

seki@seki-VirtualBox:~/src/learn2deeplearn/learnRL/OpenAIGym/FrozenLake$ head flnonslip.py  -n 20
#!/usr/bin/python2.7
import gym
import numpy as np
from gym.envs.registration import register
register(
    id='FrozenLakeNotSlippery-v0',
    entry_point='gym.envs.toy_text:FrozenLakeEnv',
    kwargs={'map_name' : '4x4', 'is_slippery': False},
    max_episode_steps=100,
    reward_threshold=0.78, # optimum = .8196
)
env = gym.make('FrozenLakeNotSlippery-v0')
#Initialize table with all zeros
X = -1
policy=np.array(
        [[1, 2, 2, 0],
         [1, X, 1, X],
         [2, 1, 1, X],
         [X, 2, 2, X]]
        )
seki@seki-VirtualBox:~/src/learn2deeplearn/learnRL/OpenAIGym/FrozenLake$ ./flnonslip.py 
/usr/local/lib/python2.7/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.22) or chardet (2.3.0) doesn't match a supported version!
  RequestsDependencyWarning)
1000 out of 1000 runs were successful
seki@seki-VirtualBox:~/src/learn2deeplearn/learnRL/OpenAIGym/FrozenLake$ 
jinzishuai commented 6 years ago

Slippery Algorithm

image

Not random but 3 steps

It basically takes 3 smaller steps, with the two adjacent directions before and after the specified move. But if any step reaches a termination step, the returns right away.

But the step() function still introduces randomness

class FrozenLakeEnv(discrete.DiscreteEnv)

The parent class is defined here https://github.com/openai/gym/blob/master/gym/envs/toy_text/discrete.py and the step() function would only take the specified direction with a probability.

What is the probability?

jinzishuai commented 6 years ago

Slippery is Stochastic

jinzishuai commented 6 years ago

Conclusion

Due to the stochastic nature of the game, the deterministic choice I made above is not necessarily a good solution. In fact, the tests show that it is not good at all.