ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.19k stars 5.8k forks source link

[rllib] Consider adding an grouping API for multi-agent #3547

Closed ericl closed 5 years ago

ericl commented 5 years ago

Describe the problem

It is common to have groups of agents in multi-agent RL, which can have either centralized or decentralized training and execution. While in principle this can be modeled already by either (1) implementing the grouping in the env, or (2) having policies that have shared layers / losses, there are a couple problems with this:

  1. Having to change the env definition to define groups makes it harder to switch between different approaches. Ideally the env definition is fixed and grouping can be defined as part of the training config.

  2. From a code simplicity perspective sharing layers across policy graph objects is confusing (especially considering how it interacts with batching). It is more straightforward to write a group policy as in pymarl: https://github.com/oxwhirl/pymarl_alpha/blob/master/src/controllers/basic_controller.py

The implementation could be pretty simple: just an auto-generated env wrapper that merges agent observations into a Tuple of observations.

Possible API:

# No grouping: all agents are controlled by independent dqn policies
"multiagent": {
    "policy_mapping_fn": lambda agent_id: "dqn"
    "policy_graphs": {
        "dqn": (DQNPolicyGraph,  ...
    },
}

# Agents 1..3 are grouped into "group1" and controlled by the qmix policy.
# This agents must be acting simultaneously in the environment.
# All other agents are controlled by independent dqn policies.
"multiagent": {
    "grouping": {
        "group1": {
            "members": ["agent_1", "agent_2", "agent_3"],
            "obs_space":
                Tuple(Discrete(2), Discrete(2), Discrete(2)),  # potentially allow heterogenous spaces
            "action_space":
                Tuple(Discrete(2), Discrete(2), Discrete(2)),
        },
    },
    "policy_mapping_fn": lambda agent_id:
        "qmix" if agent_id == "group1" else "dqn",
    "policy_graphs": {
        "qmix": (QMixPolicyGraph,  ...
        "dqn": (DQNPolicyGraph,  ...
    },
}
eugenevinitsky commented 5 years ago

This is great; does this make creating a centralized value function work instantaneously?

eugenevinitsky commented 5 years ago

Additional questions, do the agents now have a group id and an agent id? The line "qmix" if agent_id == "group1" else "dqn" makes it seem as though the agent ids no longer exist?

ericl commented 5 years ago

Kind of -- you end up with everything centralized but you can use a custom model to ensure execution is actually distributed. I guess this is a useful starting point vs everything decentralized by default.