Closed sanjeevk42 closed 6 years ago
Hi, I also noticed this issue in the code. Have you tried any test for this?
Chen
I found no error with that code. A rollout is added to the runner queue after tmax and the while loop in pull_batch_from_queue pops it. If the queue is empty, the while loop is ended and processes that rollout.
The while loop in
pull_batch_from_queue()
method of A3C (https://github.com/openai/universe-starter-agent/blob/master/a3c.py#L255) is causing the network updates after each episode is terminated instead of after tmax (num_local_steps
) steps as mentioned in A3C paper. This will produce highly correlated updates and network will fail to converge to a better policy. Am I missing something here?