openai / universe-starter-agent

A starter agent that can solve a number of universe environments.
MIT License
1.1k stars 318 forks source link

Global network is not being updated after tmax steps. #114

Closed sanjeevk42 closed 6 years ago

sanjeevk42 commented 7 years ago

The while loop in pull_batch_from_queue() method of A3C (https://github.com/openai/universe-starter-agent/blob/master/a3c.py#L255) is causing the network updates after each episode is terminated instead of after tmax (num_local_steps) steps as mentioned in A3C paper. This will produce highly correlated updates and network will fail to converge to a better policy. Am I missing something here?

caseypen commented 7 years ago

Hi, I also noticed this issue in the code. Have you tried any test for this?

Chen

ghost commented 7 years ago

I found no error with that code. A rollout is added to the runner queue after tmax and the while loop in pull_batch_from_queue pops it. If the queue is empty, the while loop is ended and processes that rollout.