zbzhu99 / madiff

Implementation of "MADiff: Offline Multi-agent Learning with Diffusion Models"
MIT License
40 stars 9 forks source link

MADiff: Offline Multi-agent Learning with Diffusion Models

Python 3.8 Code style MIT arXiv

This is the official implementation of "MADiff: Offline Multi-agent Learning with Diffusion Models".

MADiff

Performances

We omit the standard deviation of the results for brevity. The full results can be found in our paper.

Multi-agent Particle Environment (MPE)

The peformances on MPE datasets released in OMAR paper.

Dataset Task BC MA-ICQ MA-TD3+BC MA-CQL OMAR MADiff-D MADiff-C*
Expert Spread 35.0 104.0 108.3 98.2 114.9 97.0 116.0
Expert Tag 40.0 113.0 115.2 93.9 116.2 123.9 168.3
Expert World 33.0 109.5 110.3 71.9 110.4 115.4 178.9
Md-Replay Spread 10.0 13.6 15.4 20.0 37.9 29.1 43.1
Md-Replay Tag 0.9 34.5 28.7 24.8 47.1 63.0 98.8
Md-Replay World 2.3 12.0 17.4 29.6 42.9 60.3 84.9
Medium Spread 31.6 29.3 29.3 34.1 47.9 64.7 58.0
Medium Tag 22.5 63.3 65.1 61.7 66.7 78.3 133.5
Medium World 25.3 71.9 73.4 58.6 74.6 124.2 157.1
Random Spread -0.5 6.3 9.8 24.0 34.4 7.2 5.0
Random Tag 1.2 2.2 5.7 5.0 11.1 4.6 10.0
Random World -2.4 1.0 2.8 0.6 5.9 0.7 6.1

Multi-agent Mujoco (MA-Mujoco)

The peformances on MA-Mujoco datasets released in off-the-grid MARL benchmark.

Dataset Task BC MA-TD3+BC OMAR MADiff-D MADiff-C*
Good 2halfcheetah 6846 7025 1434 8254 8662
Medium 2halfcheetah 1627 2561 1892 2215 2221
Poor 2halfcheetah 465 736 384 751 767
Good 2ant 2697 2922 464 2940 3105
Medium 2ant 1145 744 799 1210 1241
Poor 2ant 954 1256 857 902 1037
Good 4ant 2802 2628 344 3090 3087
Medium 4ant 1617 1843 929 1679 1897
Poor 4ant 1033 1075 518 1268 1332

StarCraft Multi-Agent Challenge (SMAC)

The peformances on SMAC datasets released in off-the-grid MARL benchmark.

Dataset Task BC QMIX MA-ICQ MA-CQL MADT MADiff-D MADiff-C*
Good 3m 16.0 13.8 18.8 19.6 19.0 19.6 20.0
Medium 3m 8.2 17.3 18.1 18.9 15.8 17.2 18.0
Poor 3m 4.4 10.0 14.4 5.8 4.2 8.9 9.3
Good 2s3z 18.2 5.9 19.6 19.0 19.3 19.4 19.5
Medium 2s3z 12.3 5.2 17.2 14.3 15.9 17.4 17.7
Poor 2s3z 6.7 3.8 12.1 10.1 7.0 9.9 10.8
Good 5m6m 16.6 8.0 16.3 13.8 16.8 18.0 18.2
Medium 5m6m 12.4 12.0 15.3 17.0 16.1 17.5 18.0
Poor 5m6m 7.5 10.7 9.4 10.4 7.6 8.9 9.5
Good 8m 16.7 4.6 19.6 11.3 18.5 19.2 20.0
Medium 8m 10.7 13.9 18.6 16.8 18.2 19.2 19.5
Poor 8m 5.3 6.0 10.8 4.6 4.8 5.1 5.2

* MADiff-C is not meant to be a fair comparison with baseline methods but to show if MADiff-D fills the gap for coordination without global information.

Setup

Installation

sudo apt-get update
sudo apt-get install libssl-dev libcurl4-openssl-dev swig
conda create -n madiff python=3.8
conda activate madiff
pip install torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Setup MPE

We use the MPE dataset from OMAR. The dataset download link and instructions can be found in OMAR's repo. Since their BaiduPan download links might be inconvenient for non-Chinese users, we maintain a anonymous mirror repo in OSF for acquiring the dataset.

The downloaded dataset should be placed under diffuser/datasets/data/mpe.

Install MPE environment:

pip install -e third_party/multiagent-particle-envs
pip install -e third_party/ddpg-agent

Setup MA-Mujoco

  1. Install MA-Mujoco:

    pip install -e third_party/multiagent_mujoco
  2. We use the MA-Mujoco dataset from off-the-grid MARL. We preprocess the dataset to concatenate trajectories to full episodes and save them as .npy files for easier loading. The original dataset can be downloaded from links below.

  1. Install off-the-grid MARL and transform the original dataset.

    pip install -r ./third_party/og-marl/install_environments/requirements/mamujoco.txt
    pip install -e ./third_party/og-marl
    python scripts/transform_og_marl_dataset.py --env_name mamujoco --map_name <map> --quality <dataset>

Setup SMAC

  1. Run scripts/smac.sh to install StarCraftII.

  2. Install SMAC:

    pip install git+https://github.com/oxwhirl/smac.git
  3. We use the SMAC dataset from off-the-grid MARL. We preprocess the dataset to concatenate trajectories to full episodes and save them as .npy files for easier loading. The original dataset can be downloaded from links below.

  1. Install off-the-grid MARL and transform the original dataset.

    pip install -r ./third_party/og-marl/install_environments/requirements/smacv1.txt
    pip install -e ./third_party/og-marl
    python scripts/transform_og_marl_dataset.py --env_name smac --map_name <map> --quality <dataset>

Training and Evaluation

To start training, run the following commands

# multi-agent particle environment
python run_experiment.py -e exp_specs/mpe/<task>/mad_mpe_<task>_attn_<dataset>.yaml  # CTCE
python run_experiment.py -e exp_specs/mpe/<task>/mad_mpe_<task>_ctde_<dataset>.yaml  # CTDE
# ma-mujoco
python run_experiment.py -e exp_specs/mamujoco/<task>/mad_mamujoco_<task>_attn_<dataset>_history.yaml  # CTCE
python run_experiment.py -e exp_specs/mamujoco/<task>/mad_mamujoco_<task>_ctde_<dataset>_history.yaml  # CTDE
# smac
python run_experiment.py -e exp_specs/smac/<map>/mad_smac_<map>_attn_<dataset>_history.yaml  # CTCE
python run_experiment.py -e exp_specs/smac/<map>/mad_smac_<map>_ctde_<dataset>_history.yaml  # CTDE

To evaluate the trained model, first replace the log_dir with those need to be evaluated in exp_specs/eval_inv.yaml and run

python run_experiment.py -e exp_specs/eval_inv.yaml

Citation

@article{zhu2023madiff,
  title={MADiff: Offline Multi-agent Learning with Diffusion Models},
  author={Zhu, Zhengbang and Liu, Minghuan and Mao, Liyuan and Kang, Bingyi and Xu, Minkai and Yu, Yong and Ermon, Stefano and Zhang, Weinan},
  journal={arXiv preprint arXiv:2305.17330},
  year={2023}
}

Acknowledgements

The codebase is built upon decision-diffuser repo and ILSwiss.