Open jinzishuai opened 6 years ago
E:\ShiJin\learn2deeplearn\learnRL\OpenAIGym\FrozenLake\genetic>python frozenlake_genetic_algorithm.py
[2018-01-29 15:47:57,703] Making new env: FrozenLake-v0
Generation 1 : max score = 0.20
Generation 2 : max score = 0.30
Generation 3 : max score = 0.60
Generation 4 : max score = 0.66
Generation 5 : max score = 0.79
Generation 6 : max score = 0.84
Generation 7 : max score = 0.80
Generation 8 : max score = 0.78
Generation 9 : max score = 0.79
Generation 10 : max score = 0.80
Generation 11 : max score = 0.80
Generation 12 : max score = 0.80
Generation 13 : max score = 0.82
Generation 14 : max score = 0.80
Generation 15 : max score = 0.86
Generation 16 : max score = 0.81
Generation 17 : max score = 0.81
Generation 18 : max score = 0.78
Generation 19 : max score = 0.84
Generation 20 : max score = 0.79
Best policy score = 0.85. Time taken = 52.1701
Best policy = [[0 3 3 3]
[0 3 0 1]
[3 1 0 3]
[3 2 1 0]]
Not any better than my algorithms.
E:\ShiJin\learn2deeplearn\learnRL\OpenAIGym\FrozenLake>python fl_human_policy.py
[2018-01-29 15:51:33,075] Making new env: FrozenLake-v0
policy=
[[ 0 3 3 3]
[ 0 -1 0 -1]
[ 3 1 0 -1]
[-1 2 1 -1]]
7355 out of 10000 runs were successful
There is a clearly a difference between the first and second computation of the scores due to stochasticity:
E:\ShiJin\learn2deeplearn\learnRL\OpenAIGym\FrozenLake\genetic>python frozenlake_genetic_algorithm.py
[2018-01-29 20:29:33,700] Making new env: FrozenLake-v0
Generation 1 : max score = 0.15, recomputed score=0.16
Generation 2 : max score = 0.34, recomputed score=0.28
Generation 3 : max score = 0.65, recomputed score=0.59
Generation 4 : max score = 0.74, recomputed score=0.84
Generation 5 : max score = 0.80, recomputed score=0.84
Generation 6 : max score = 0.82, recomputed score=0.67
Generation 7 : max score = 0.85, recomputed score=0.67
Generation 8 : max score = 0.81, recomputed score=0.74
Generation 9 : max score = 0.81, recomputed score=0.61
Generation 10 : max score = 0.79, recomputed score=0.61
Generation 11 : max score = 0.78, recomputed score=0.70
Generation 12 : max score = 0.83, recomputed score=0.71
Generation 13 : max score = 0.82, recomputed score=0.74
Generation 14 : max score = 0.81, recomputed score=0.67
Generation 15 : max score = 0.77, recomputed score=0.70
Generation 16 : max score = 0.80, recomputed score=0.76
Generation 17 : max score = 0.80, recomputed score=0.67
Generation 18 : max score = 0.83, recomputed score=0.76
Best policy score = 0.84. Time taken = 90.5285
Best policy = [[0 3 3 3]
[0 2 2 2]
[3 1 0 2]
[2 2 1 3]]
best policy scoe = 0.71
E:\ShiJin\learn2deeplearn\learnRL\OpenAIGym\FrozenLake\genetic>python frozenlake_genetic_algorithm.py
[2018-01-29 20:31:13,861] Making new env: FrozenLake-v0
Generation 1 : max score = 0.17, recomputed score=0.17
Generation 2 : max score = 0.28, recomputed score=0.29
Generation 3 : max score = 0.60, recomputed score=0.59
Generation 4 : max score = 0.71, recomputed score=0.69
Generation 5 : max score = 0.70, recomputed score=0.70
Generation 6 : max score = 0.70, recomputed score=0.69
Generation 7 : max score = 0.75, recomputed score=0.74
Generation 8 : max score = 0.74, recomputed score=0.73
Generation 9 : max score = 0.74, recomputed score=0.76
Generation 10 : max score = 0.75, recomputed score=0.72
Generation 11 : max score = 0.75, recomputed score=0.73
Generation 12 : max score = 0.75, recomputed score=0.72
Generation 13 : max score = 0.75, recomputed score=0.72
Generation 14 : max score = 0.76, recomputed score=0.73
Generation 15 : max score = 0.76, recomputed score=0.72
Generation 16 : max score = 0.75, recomputed score=0.73
Generation 17 : max score = 0.75, recomputed score=0.71
Generation 18 : max score = 0.75, recomputed score=0.73
Best policy score = 0.77. Time taken = 859.8491
Best policy = [[0 3 3 3]
[0 3 2 1]
[3 1 0 0]
[2 2 1 1]]
best policy scoe = 0.74
def evaluate_policy(env, policy, n_episodes=10000):
total_rewards = 0.0
for _ in range(n_episodes):
total_rewards += run_episode(env, policy)
return total_rewards / n_episodes
results:
E:\ShiJin\learn2deeplearn\learnRL\OpenAIGym\FrozenLake\genetic>python frozenlake_genetic_algorithm.py
[2018-01-29 20:49:12,746] Making new env: FrozenLake-v0
Generation 1 : max score = 0.16, recomputed score=0.16
Generation 2 : max score = 0.30, recomputed score=0.30
Generation 3 : max score = 0.48, recomputed score=0.48
Generation 4 : max score = 0.69, recomputed score=0.70
Generation 5 : max score = 0.74, recomputed score=0.74
Generation 6 : max score = 0.74, recomputed score=0.74
Generation 7 : max score = 0.74, recomputed score=0.74
Generation 8 : max score = 0.73, recomputed score=0.73
Generation 9 : max score = 0.75, recomputed score=0.74
Generation 10 : max score = 0.74, recomputed score=0.74
Generation 11 : max score = 0.74, recomputed score=0.74
Generation 12 : max score = 0.74, recomputed score=0.74
Generation 13 : max score = 0.75, recomputed score=0.74
Generation 14 : max score = 0.74, recomputed score=0.74
Generation 15 : max score = 0.75, recomputed score=0.74
Generation 16 : max score = 0.75, recomputed score=0.74
Generation 17 : max score = 0.75, recomputed score=0.75
Generation 18 : max score = 0.75, recomputed score=0.75
Best policy score = 0.75. Time taken = 9602.4171
Best policy = [[0 3 3 3]
[0 2 0 1]
[3 1 0 3]
[1 2 1 3]]
best policy scoe = 0.74
But my results are around 70-75%: https://github.com/jinzishuai/learn2deeplearn/blob/master/learnRL/OpenAIGym/FrozenLake/results.md
we should test this which claims to be 0.79 ± 0.05: https://gym.openai.com/evaluations/eval_4VyQBhXMRLmG9y9MQA5ePA/