hongzimao / pensieve

Neural Adaptive Video Streaming with Pensieve (SIGCOMM '17)
http://web.mit.edu/pensieve/
MIT License
524 stars 280 forks source link

Is multi_agent an actual A3C implementation ? #104

Closed mkanakis closed 4 years ago

mkanakis commented 4 years ago

Hi,

I know this is more of a technicality, but I would like to clarify this.

A3C stands for Asynchronous Advantage Actor-Critic, whereas A2C can also be considered as a synchronous version OpenAI.

Isn't it the case that in your multi_agent implementation you are using synchronous network updates for all agents, taken from this snippet:

for i in xrange(NUM_AGENTS):
    s_batch, a_batch, r_batch, terminal, info = exp_queues[i].get()

    a3c.compute_gradients(
                s_batch=np.stack(s_batch, axis=0),
                a_batch=np.vstack(a_batch),
                 r_batch=np.vstack(r_batch),
                 terminal=terminal, actor=actor, critic=critic)

    actor_gradient_batch.append(actor_gradient)
    critic_gradient_batch.append(critic_gradient)

If it is, then you are actually updating the model for all agents at the same time, therefore we are talking about A2C rather than an A3C implementation?

Is my though process correct?

Kind regards, Marios.

hongzimao commented 4 years ago

Hi Marios,

Thanks for pointing this detail out! You are right, our implementation is essentially A2C. As mentioned by OpenAI, A2C sometimes outperforms A3C (which makes sense, because A3C can introduce more variance if the policy gradient doesn't contain off-policy corrections like in IMPALA). We didn't need the extra speed boost by going fully asynchronous, so we stayed with A2C. Hope this helps!

Best, Hongzi

mkanakis commented 4 years ago

Thanks a lot for the clarification!

Closing.

Best, Marios.