muupan / async-rl

Replicating "Asynchronous Methods for Deep Reinforcement Learning" (http://arxiv.org/abs/1602.01783)
MIT License
401 stars 83 forks source link

Some evaluation results are missing #5

Open muupan opened 8 years ago

muupan commented 8 years ago

In scores.txt of the current uploaded trained model, evaluation results at 55000000 and 56000000 are missing.

https://github.com/muupan/async-rl/blob/0ec501c36b6cdbe3888d3a6fc0043e4bc6c2cba3/trained_model/breakout/scores.txt#L55

I don't know why and whether it can affect performance. I need to check.

muupan commented 8 years ago

I found that missing evaluation is caused by processes stuck in evaluate_performance(). It is possible that some policies fail start to play Breakout, preventing episodes from being terminated. If so, it might be necessary to use epsilon-greedy-like action selection in addition to sampling from softmax policies in test runs.

muupan commented 8 years ago

It didn't occurred for Space Invaders. For Breakout we might need to force long episodes to finish.