sjtu-marl / malib

A parallel framework for population-based multi-agent reinforcement learning.
https://malib.io
MIT License
499 stars 60 forks source link

Performance Results #35

Open KornbergFresnel opened 2 years ago

KornbergFresnel commented 2 years ago

Throughput Comparison

All the experiment results listed are obtained with one of the following hardware settings: (1) System # 1: a 32-core computing node with dual graphics cards. (2) System # 2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090).

Throughput comparison among the existing RL frameworks and MALib. Due to resource limitation (32 cores, 256G RAM), RLlib fails under heavy loads (CPU case: #workers >32, GPU case: #workers > 8). MALib outperforms other frameworks with only CPU and achieves comparable performance with the highly tailored framework Sample-Factory with GPU despite higher abstraction introduced. To better illustrate the scalability of MALib, we show the MA-Atari and SC2 throughput on System # 2 under different worker settings, the 512-workers group on SC2 fails due to resource limitation.

merged_throughput_report

Additional comparisons between MALib and other distributed RL training frameworks. (Left): System # 3 cluster throughput of MALib in 2-player MA-Atari and 3-player SC2. (Middle): 4-player MA-Atari throughput comparison on System # 1 without GPU. (Right)} 4-player MA-Atari throughput comparison on System # 1 with GPU.

merged_throughput_report_4p

Wall-time & Performance of PB-MARL Algorithm

Comparisons of PSRO between MALib and OpenSpiel. (a) indicates that MALib achieves the same performance on exploitability as OpenSpiel; (b) shows that the convergence rate of MALib is 3x faster than OpenSpiel; (c) shows that MALib achieves a higher execution efficiency than OpenSpiel, since it requires less time consumption to iterate the same learning steps, which means MALib has the potential to scale up in more complex tasks that need to run for much more steps.

pb-marl_wall_time

Typical MARL Algorithms

Results on Multi-agent Particle Environments

Comparisons of MADDPG in simple adversary under different rollout worker settings. Figures in the top row depict each agent's episode reward w.r.t. the number of sampled episodes, which indicates that MALib converges faster than RLlib with equal sampled episodes. Figures in the bottom row show the average time and average episode reward at the same number of sampled episodes, which indicates that MALib achieves 5x speedup than RLlib.

simple_adversary

Scenario Crypto

simple_crypto

Simple Push

simple_push

Simple Reference

simple_reference

Simple Speaker Listener

simple_speaker_listener

Simple Tag

simple_tag