thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
8.02k stars 1.13k forks source link

Does Tianshou truly supports MARL out of the box? #1137

Open Legendorik opened 6 months ago

Legendorik commented 6 months ago

I'm trying to use PettingZoo with Tianshou, but the documentation doesn't explain much for training multiple agents at the same time. My agents have to act cooperatively and perform the same tasks, so they have the same policy. It looks simple - just replace the random policy in the example with the right one, and it even seems to work, but I'm not sure it works correctly.

  1. For some reason, only the first agent's rewards come into the reward_metric function. For the second agent it is always 0. Within the PettingZoo AEC environment, the rewards are successfully accumulated.
def step(self, action):
        ... #the very end of my step function
        if self._agent_selector.is_last():
            self._accumulate_rewards() #global rewards are definitely updated
            self._clear_rewards()

        self.agent_selection = self._agent_selector.next()
def reward_metric(rews):
    return rews[:, 0] #shape of rews is correct, but rews[:, 1] is always 0
  1. Am I initializing the policy correctly (with a shared network object, otherwise the second agent is not trained (although maybe this follows from the first point))?
def _get_agents(...):
    ...
    if agent1 is None:
        net = Net(
            state_shape=observation_space.shape or observation_space.n,
            action_shape=env.action_space.shape or env.action_space.n,
            hidden_sizes=args.hidden_sizes,
            softmax=True,
            num_atoms=51,
            dueling_param=({
                "linear_layer": noisy_linear
            }, {
                "linear_layer": noisy_linear
            }),
            device=args.device,
        ).to(args.device)
        if optim is None:
            optim = torch.optim.Adam(net.parameters(), lr=args.lr)
        agent1 = RainbowPolicy(
            model=net,
            optim=optim,
            action_space=env.action_space,
            discount_factor=args.gamma,
            estimation_step=args.n_step,
            target_update_freq=args.target_update_freq,
        ).to(args.device)

        if (args.watch):
            agent1.load_state_dict(torch.load('./log/ttt/dqn/policy_0.pth'))

    if agent2 is None:

        agent2 = RainbowPolicy(
            model=net,
            optim=optim,
            action_space=env.action_space,
            discount_factor=args.gamma,
            estimation_step=args.n_step,
            target_update_freq=args.target_update_freq,
        ).to(args.device)

        if (args.watch):
            agent2.load_state_dict(torch.load('./log/ttt/dqn/policy_1.pth'))

Tianshou: 1.0.0 PettingZoo: 1.24.3

Thank you for your time!

MischaPanch commented 6 months ago

Hi @Legendorik

The core team that has been working on tianshou for the last 6 months has deprioritized marl - there's just too many other things that need fixing first. Also, it's more of a niche topic.

I'm afraid I'm the foreseeable future we won't be able to help you with that, but I'm happy to review a PR if you decide to dive deeper into marl with tianshou.

Maybe @Trinkle23897 or @ChenDRAG can help though