It seems that the training is decentralized?

I have looked through the train.py and found that your guys provide each agent an trainer:

def get_trainers(env, num_adversaries, obs_shape_n, arglist):

    trainers = []

    model = mlp_model

    trainer = MADDPGAgentTrainer

    for i in range(num_adversaries):

        trainers.append(trainer(

            "agent_%d" % i, model, obs_shape_n, env.action_space, i, arglist,

            local_q_func=(arglist.adv_policy=='ddpg')))

    for i in range(num_adversaries, env.n):

        trainers.append(trainer(

            "agent_%d" % i, model, obs_shape_n, env.action_space, i, arglist,

            local_q_func=(arglist.good_policy=='ddpg')))

    return trainers

It seems that maybe I have wrong understanding. But decentralize training in my understanding mean using an identical model to learn the q function, so therefore I can't understanding why assign each agent a trainer(include the model in the trainer), since you get so many model to train rather than only one.

And I found that even you use Reuse=True in the setting of tf.variable_scope, but each model of the trainer of a agent has name like "agent_0/fully_connected/weights". That means all the weights and bias of the model are not exactly the same. Namely, agent_0 has it's own model, agent_1 has it's own model, ...

So how could you say your training of this multi-agents system is centralized?

Look forward to your reply! Thanks!

openai / maddpg

It seems that the training is decentralized? #7