Open muupan opened 8 years ago
I found that missing evaluation is caused by processes stuck in evaluate_performance()
. It is possible that some policies fail start to play Breakout, preventing episodes from being terminated. If so, it might be necessary to use epsilon-greedy-like action selection in addition to sampling from softmax policies in test runs.
It didn't occurred for Space Invaders. For Breakout we might need to force long episodes to finish.
In
scores.txt
of the current uploaded trained model, evaluation results at55000000
and56000000
are missing.https://github.com/muupan/async-rl/blob/0ec501c36b6cdbe3888d3a6fc0043e4bc6c2cba3/trained_model/breakout/scores.txt#L55
I don't know why and whether it can affect performance. I need to check.