vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.02k stars 575 forks source link

Cleanrl for MARL #330

Closed vbaddam closed 8 months ago

vbaddam commented 1 year ago

Contribution to MARL

I would like to contribute to Cleanrl repo by extending RL algorithms to Multi-Agent Systems (i.e MARL). I have discussed the same with @vwxyzjn, and he suggested starting an issue here. If anyone is interested in contributing to MARL, please respond here. Going forward, we can lay out the roadmap and share the responsibilities.

Thank you.

vwxyzjn commented 1 year ago

Thanks, @vbaddam, MARL it's an exciting research field that we would love to get into. May I ask which algorithm are you thinking of contributing?

Two related papers/projects recently caught my attention https://arxiv.org/abs/2209.10485 and https://github.com/oxwhirl/smacv2.

vbaddam commented 1 year ago

I'm thinking of starting with the MADDPG, since it is one of the first MARL algorithms that came out. Then by extending it to MATD3 and MSAC (offline and direct implementation of single RL algorithms), as these algorithms overcome the issues of MADDPG. Along with that, we can look at implementing MAPPO (the significant results are published here: https://arxiv.org/pdf/2103.01955.pdf)

Thanks for sharing the paper. It looks resourceful.

vwxyzjn commented 1 year ago

Oh cool. What is the simulation environment for MADDPG? Is it MuJoCo?

vbaddam commented 1 year ago

I will start with MultiAgent MuJoCo (https://github.com/schroederdewitt/multiagent_mujoco) since it could help us see the direct difference between Single Agent and Multi-Agent

51616 commented 1 year ago

@vbaddam @vwxyzjn I can give a hand for implementing MARL algos. I have a working MAPPO + MAMujoco implemented in torch. I think the big difference from single-agent code is the design about parameter sharing and agent and data handling. Please let me know if we should discuss this elsewhere.

vwxyzjn commented 1 year ago

Thanks a lot @vbaddam and @51616. This is the perfect place to discuss. MADDPG in MultiAgent MuJoCo sounds great. Hope its installation won't cause too many issues (e.g., dependencies conflict). One quick suggestion I have is to maybe implement it in JAX, since DDPG with JAX is a lot faster, and parameter sharing is more intuitive in JAX. That said, feel free to pick your tech stack.

vbaddam commented 1 year ago

Yes, Sure. @51616. It would be great to you have on board. Should we set up a meeting and discuss the structure so we can be on the same page?

@vwxyzjn I think using JAX is a good suggestion. However, I'm still catching up with JAX. Maybe I can implement it using PyTorch and extend it for the next iterations.

51616 commented 1 year ago

@vbaddam Sounds great to me! Please hit me up after the holiday.

rodrigodelazcano commented 1 year ago

Hello everyone! I just wanted to jump into the conversation. @Kallinteris-Andreas has done an amazing work refactoring the MultiAgent MuJoCo environments into the Gymnasium-Robotics repo. They are using the Pettingzoo API and the documentation can be found here https://robotics.farama.org/envs/MaMuJoCo/ma_half_cheetah/.

We are actively maintaining this repo and it would be great seeing benchmarks of these environments with new CleanRL MARL algos.

Also, we are waiting to benchmark these environments before making a new release, so for the time being they have to be installed from source.

kinalmehta commented 1 year ago

Hello everyone! I have been working on MARL with JAX and PyTorch and can be of help. Let me know what you guys have planned out.

51616 commented 1 year ago

@vbaddam Could you make a roadmap + progress tracker for this?

vbaddam commented 1 year ago

Here is the checklist and progress tracker. I will add the clean Roadmap once we finish the Stage 1.

ffelten commented 1 year ago

Hi, I'm also interested to see this being implemented! I have recently been trying to adapt SAC from cleanRL to multi agent setting: https://github.com/ffelten/MASAC.

It is still a WIP but it seems to learn stuff from early results. I would love to hear feedbacks and or tips for such adaptation of algorithms.

Cheers,

Halmoni100 commented 1 year ago

Hello! I am working on a MARL project with a couple of other people, and so I'd be interested in roadmap/progress.

vwxyzjn commented 8 months ago

Looks like https://github.com/kinalmehta/marl-jax came up. Closing this for now