MADiff: Offline Multi-agent Learning with Diffusion Models

This is the official implementation of "MADiff: Offline Multi-agent Learning with Diffusion Models".

MADiff

Performances

We omit the standard deviation of the results for brevity. The full results can be found in our paper.

Multi-agent Particle Environment (MPE)

The peformances on MPE datasets released in OMAR paper.

Dataset	Task	BC	MA-ICQ	MA-TD3+BC	MA-CQL	OMAR	MADiff-D	MADiff-C*
Expert	Spread	35.0	104.0	108.3	98.2	114.9	97.0	116.0
Expert	Tag	40.0	113.0	115.2	93.9	116.2	123.9	168.3
Expert	World	33.0	109.5	110.3	71.9	110.4	115.4	178.9
Md-Replay	Spread	10.0	13.6	15.4	20.0	37.9	29.1	43.1
Md-Replay	Tag	0.9	34.5	28.7	24.8	47.1	63.0	98.8
Md-Replay	World	2.3	12.0	17.4	29.6	42.9	60.3	84.9
Medium	Spread	31.6	29.3	29.3	34.1	47.9	64.7	58.0
Medium	Tag	22.5	63.3	65.1	61.7	66.7	78.3	133.5
Medium	World	25.3	71.9	73.4	58.6	74.6	124.2	157.1
Random	Spread	-0.5	6.3	9.8	24.0	34.4	7.2	5.0
Random	Tag	1.2	2.2	5.7	5.0	11.1	4.6	10.0
Random	World	-2.4	1.0	2.8	0.6	5.9	0.7	6.1

Multi-agent Mujoco (MA-Mujoco)

The peformances on MA-Mujoco datasets released in off-the-grid MARL benchmark.

Dataset	Task	BC	MA-TD3+BC	OMAR	MADiff-D	MADiff-C*
Good	2halfcheetah	6846	7025	1434	8254	8662
Medium	2halfcheetah	1627	2561	1892	2215	2221
Poor	2halfcheetah	465	736	384	751	767
Good	2ant	2697	2922	464	2940	3105
Medium	2ant	1145	744	799	1210	1241
Poor	2ant	954	1256	857	902	1037
Good	4ant	2802	2628	344	3090	3087
Medium	4ant	1617	1843	929	1679	1897
Poor	4ant	1033	1075	518	1268	1332

StarCraft Multi-Agent Challenge (SMAC)

The peformances on SMAC datasets released in off-the-grid MARL benchmark.

Dataset	Task	BC	QMIX	MA-ICQ	MA-CQL	MADT	MADiff-D	MADiff-C*
Good	3m	16.0	13.8	18.8	19.6	19.0	19.6	20.0
Medium	3m	8.2	17.3	18.1	18.9	15.8	17.2	18.0
Poor	3m	4.4	10.0	14.4	5.8	4.2	8.9	9.3
Good	2s3z	18.2	5.9	19.6	19.0	19.3	19.4	19.5
Medium	2s3z	12.3	5.2	17.2	14.3	15.9	17.4	17.7
Poor	2s3z	6.7	3.8	12.1	10.1	7.0	9.9	10.8
Good	5m6m	16.6	8.0	16.3	13.8	16.8	18.0	18.2
Medium	5m6m	12.4	12.0	15.3	17.0	16.1	17.5	18.0
Poor	5m6m	7.5	10.7	9.4	10.4	7.6	8.9	9.5
Good	8m	16.7	4.6	19.6	11.3	18.5	19.2	20.0
Medium	8m	10.7	13.9	18.6	16.8	18.2	19.2	19.5
Poor	8m	5.3	6.0	10.8	4.6	4.8	5.1	5.2

* MADiff-C is not meant to be a fair comparison with baseline methods but to show if MADiff-D fills the gap for coordination without global information.

Setup

Installation

sudo apt-get update
sudo apt-get install libssl-dev libcurl4-openssl-dev swig
conda create -n madiff python=3.8
conda activate madiff
pip install torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Setup MPE

We use the MPE dataset from OMAR. The dataset download link and instructions can be found in OMAR's repo. Since their BaiduPan download links might be inconvenient for non-Chinese users, we maintain a anonymous mirror repo in OSF for acquiring the dataset.

The downloaded dataset should be placed under diffuser/datasets/data/mpe.

Install MPE environment:

pip install -e third_party/multiagent-particle-envs
pip install -e third_party/ddpg-agent

Setup MA-Mujoco

Install MA-Mujoco:

pip install -e third_party/multiagent_mujoco

We use the MA-Mujoco dataset from off-the-grid MARL. We preprocess the dataset to concatenate trajectories to full episodes and save them as .npy files for easier loading. The original dataset can be downloaded from links below.

2halfcheetah
2ant
4ant

The downloaded dataset should be placed under diffuser/datasets/data/mamujoco.

Install off-the-grid MARL and transform the original dataset.

pip install -r ./third_party/og-marl/install_environments/requirements/mamujoco.txt
pip install -e ./third_party/og-marl
python scripts/transform_og_marl_dataset.py --env_name mamujoco --map_name <map> --quality <dataset>

Setup SMAC

Run scripts/smac.sh to install StarCraftII.

Install SMAC:

pip install git+https://github.com/oxwhirl/smac.git

We use the SMAC dataset from off-the-grid MARL. We preprocess the dataset to concatenate trajectories to full episodes and save them as .npy files for easier loading. The original dataset can be downloaded from links below.

3m
2s3z
5m_vs_6m
8m

The downloaded dataset should be placed under diffuser/datasets/data/smac.

Install off-the-grid MARL and transform the original dataset.

pip install -r ./third_party/og-marl/install_environments/requirements/smacv1.txt
pip install -e ./third_party/og-marl
python scripts/transform_og_marl_dataset.py --env_name smac --map_name <map> --quality <dataset>

Training and Evaluation

To start training, run the following commands

# multi-agent particle environment
python run_experiment.py -e exp_specs/mpe/<task>/mad_mpe_<task>_attn_<dataset>.yaml  # CTCE
python run_experiment.py -e exp_specs/mpe/<task>/mad_mpe_<task>_ctde_<dataset>.yaml  # CTDE
# ma-mujoco
python run_experiment.py -e exp_specs/mamujoco/<task>/mad_mamujoco_<task>_attn_<dataset>_history.yaml  # CTCE
python run_experiment.py -e exp_specs/mamujoco/<task>/mad_mamujoco_<task>_ctde_<dataset>_history.yaml  # CTDE
# smac
python run_experiment.py -e exp_specs/smac/<map>/mad_smac_<map>_attn_<dataset>_history.yaml  # CTCE
python run_experiment.py -e exp_specs/smac/<map>/mad_smac_<map>_ctde_<dataset>_history.yaml  # CTDE

To evaluate the trained model, first replace the log_dir with those need to be evaluated in exp_specs/eval_inv.yaml and run

python run_experiment.py -e exp_specs/eval_inv.yaml

Citation

@article{zhu2023madiff,
  title={MADiff: Offline Multi-agent Learning with Diffusion Models},
  author={Zhu, Zhengbang and Liu, Minghuan and Mao, Liyuan and Kang, Bingyi and Xu, Minkai and Yu, Yong and Ermon, Stefano and Zhang, Weinan},
  journal={arXiv preprint arXiv:2305.17330},
  year={2023}
}

Acknowledgements

The codebase is built upon decision-diffuser repo and ILSwiss.

zbzhu99 / madiff

readme