[rllib] COMA implementation for multi-agent environments

rallen10 commented 4 years ago

To expand the multi-agent RL algorithms, RLlib would greatly benefit from an implementation of the counterfactual multi-agent policy gradients (COMA) algorithm from Foerster et al. (https://arxiv.org/abs/1705.08926).

Is anyone aware of such an implementation in RLlib or ongoing work in this direction?

ericl commented 4 years ago

I don't think anyone is working on this. Is the performance significantly better than other MARL algorithms though? (i.e., would this be of practical interest?)

rallen10 commented 4 years ago

@ericl I think it would be of practical interest because it is one of the most cited MARL algorithms along with MADDPG and QMIX. I haven't seen direct comparison with QMIX, but it performs better than MADDPG in most environments (although, to be fair, MADDPG seems to perform poorly in anything but the particle envs it was developed with).

I think the real practical interest would come from the expansion of a suite of MARL algorithms much in the same way RLlib provides a suite of single-agent RL algorithms (PPO, SAC, DQN, etc.). It is difficult to benchmark MARL algorithms against one another because there doesn't seem to be a singular library that has implementations of the various algorithms. I would hope RLlib could serve that purpose

marksimi commented 1 year ago

Have been doing some reading on COMA and similar to @rallen10 comment, I'd love to see COMA in RLlib. Noticed this was closed; I'd be interested in pot'l working on this on the side.

layssi commented 1 year ago

+1

ray-project / ray

[rllib] COMA implementation for multi-agent environments #10510