pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.2k stars 290 forks source link

[Feature Request] Working Multi-Agent DDPG Implementation #1317

Open Acciorocketships opened 1 year ago

Acciorocketships commented 1 year ago

Currently, there is a working multi-agent PPO implementation here: https://github.com/matteobettini/rl/blob/mappo_ippo/examples/multiagent/mappo_ippo.py

and a working single-agent DDPG implementation here: https://github.com/pytorch/rl/blob/ddpg_example/ddpg_example.py

However, there does not seem to be a working multi-agent DDPG implementation (the multi-agent DDPG example in the same repo as the first link runs into an error). Would it be possible to provide a multi-agent DDPG example script? I am specifically interested in using it with VMAS.

cc @matteobettini

matteobettini commented 1 year ago

Right so the situation is this: In this branch (https://github.com/matteobettini/rl/tree/mappo_ippo) i implemented the examples of MARL in VMAS.

Recently, I am reworking them since we have decided that agent-specific keys will be at a deeper nested level in tensordicts. While this has been simpler for PPO type loss it has brought to light many other bugs related to using nested keys in torchrl #1279, #1278, #1273, #1269, #1268). Which we will solve as soon as possible.

However, I have created a tag for the last working version of all scripts for paper submission https://github.com/matteobettini/rl/tree/torchrl_paper . If you want to use MADDPG I suggest to check out that tag. You will just have to also check out the commit in the tensordict repository closer to the day that the commit referenced by the tag is on.

Acciorocketships commented 1 year ago

Thank you, please let me know when it works on the current version!

Also, I think that script might need to be updated with the fixes from this thread: https://github.com/pytorch/rl/issues/1181

matteobettini commented 1 year ago

Yep, we'll take that into account

matteobettini commented 1 year ago

MADDPG on that branch now works, what specifically should we add from https://github.com/pytorch/rl/issues/1181? @smorad @Acciorocketships do you have any hints since you used ddpg a lot

smorad commented 1 year ago

You definitely need target networks. I'd also suggest prioritized experience replay and a large replay buffer (and use extend rather than add to put things in the buffer).

matteobettini commented 1 year ago

MADDPG is now working in PR https://github.com/pytorch/rl/pull/1027. We aim to merge that PR soon