Closed jinzishuai closed 6 years ago
Anser: https://github.com/openai/gym/issues/565
No surprisingly, my policy works all the time since this becomes deterministic.
seki@seki-VirtualBox:~/src/learn2deeplearn/learnRL/OpenAIGym/FrozenLake$ head flnonslip.py -n 20
#!/usr/bin/python2.7
import gym
import numpy as np
from gym.envs.registration import register
register(
id='FrozenLakeNotSlippery-v0',
entry_point='gym.envs.toy_text:FrozenLakeEnv',
kwargs={'map_name' : '4x4', 'is_slippery': False},
max_episode_steps=100,
reward_threshold=0.78, # optimum = .8196
)
env = gym.make('FrozenLakeNotSlippery-v0')
#Initialize table with all zeros
X = -1
policy=np.array(
[[1, 2, 2, 0],
[1, X, 1, X],
[2, 1, 1, X],
[X, 2, 2, X]]
)
seki@seki-VirtualBox:~/src/learn2deeplearn/learnRL/OpenAIGym/FrozenLake$ ./flnonslip.py
/usr/local/lib/python2.7/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.22) or chardet (2.3.0) doesn't match a supported version!
RequestsDependencyWarning)
1000 out of 1000 runs were successful
seki@seki-VirtualBox:~/src/learn2deeplearn/learnRL/OpenAIGym/FrozenLake$
It basically takes 3 smaller steps, with the two adjacent directions before and after the specified move. But if any step reaches a termination step, the returns right away.
class FrozenLakeEnv(discrete.DiscreteEnv)
The parent class is defined here https://github.com/openai/gym/blob/master/gym/envs/toy_text/discrete.py and the step()
function would only take the specified direction with a probability.
Due to the stochastic nature of the game, the deterministic choice I made above is not necessarily a good solution. In fact, the tests show that it is not good at all.
https://github.com/jinzishuai/learn2deeplearn/tree/master/learnRL/OpenAIGym/FrozenLake
For example, the one I chose as
using the first choice only worked for around 50 out of 1000 episods while the results of a qtable learning works at about 70% success rate.