openai / universe-starter-agent

A starter agent that can solve a number of universe environments.
MIT License
1.1k stars 318 forks source link

"global_step" in A3C #119

Closed Emerald01 closed 6 years ago

Emerald01 commented 7 years ago

Hi,

I am confused about the "global_step" implementation at A3C class. It should be used to track the global step in the training loop, and is supervised by the Supervisor. However, I do not see how it can be updated in the source code. Maybe I missed some points.

Firstly, what does the following mean ? I think what it does is: local_step = global_step + state_dim, but why state_dim is needed to add on top of global_step? inc_step = self.global_step.assign_add(tf.shape(pi.x))[0]

Secondly, I do not see any ops on global_step, in the train_op, inc_step is grouped together with apply_gradients() ops, I think what this means is that every call for train_op, inc_step will be increased by state_dim due to the code above, but what does this mean again? On the other hand, global_step has no update ops as far as I can see. self.train_op = tf.group(opt.apply_gradients(grads_and_vars), inc_step)

However, this global_step is regarded as an operator, I do not see where it operates. Look to me it magically increments somewhere?

fetches = [self.train_op, self.global_step]

caseypen commented 6 years ago

Hi, My understanding is that global time step is how many frames you go over in the game. For a3c, the input of each x is every 4 frames, so each time when you run pi (policy network), it will calculate how many frames (global steps) the game run over. In this way, the global step increase and accumulated. global_step is a tensorflow variable. It is processed like this: https://www.tensorflow.org/api_docs/python/tf/get_variable. Glad to discuss and if you think I'm wrong, feel free to let me know.

Emerald01 commented 6 years ago

sounds right to me then. Thank you