openai / universe-starter-agent

A starter agent that can solve a number of universe environments.
MIT License
1.1k stars 318 forks source link

Question about global network update logic in A3C implementation #128

Closed Emerald01 closed 6 years ago

Emerald01 commented 6 years ago

Hello folks, I have a question about updating global network with local network in this A3C. If I understand the code correctly (if not, please correct me), the global network parameters are updated by the gradient learned by the local networks.

# self.loss is loss of local NN self.loss = pi_loss + 0.5 * vf_loss - entropy * 0.01 grads = tf.gradients(self.loss, pi.var_list) # self.network is global NN grads_and_vars = list(zip(grads, self.network.var_list)) # apply local gradients to global NN self.train_op = tf.group(opt.apply_gradients(grads_and_vars), inc_step)

My confusion is here: say we have two environments running in parallel, ev1 and ev2. To start with, we sync their local NN with the global NN, and each local NN will get its own gradients from its local loss function, say grad1 for ev1 and grad2 for ev2. According to the code, whenever either ev1 or ev2 obtained its grad, it will be applied to the global var_list. Then, for example, if grad1 first finishes, its gradient is sent to propagate through global NN; later grad2 comes and propagates global NN again. However, when grad2 updates global NN, global NN has already been updated by grad1; Yet grad2 is learned with respect to the original old global NN. Question is how does it make sense that this grad2 can be applied to a new global NN although grad2 is obtained from the original global NN?