marlbenchmark / on-policy

This is the official implementation of Multi-Agent PPO (MAPPO).
https://sites.google.com/view/mappo
MIT License
1.21k stars 284 forks source link

Cannot reproduce MPE simple_speaker_listener #109

Open AlbertoSinigaglia opened 1 month ago

AlbertoSinigaglia commented 1 month ago

I was trying to reproduce your results, however whenever I try to run the script about simple_speaker_listener, it crashes for a shape mismatch (Currently trying your last commit, but also on old commits this error happens)

In particular, the error is the following:

Traceback (most recent call last):
  File "../train/train_mpe.py", line 175, in <module>
    main(sys.argv[1:])
  File "../train/train_mpe.py", line 160, in main
    runner.run()
  File ".../on-policy/onpolicy/runner/separated/mpe_runner.py", line 45, in run
    train_infos = self.train()
  File ".../on-policy/onpolicy/runner/separated/base_runner.py", line 162, in train
    train_info = self.trainer[agent_id].train(self.buffer[agent_id])
  File ".../on-policy/onpolicy/algorithms/r_mappo/r_mappo.py", line 207, in train
    = self.ppo_update(sample, update_actor)
  File ".../on-policy/onpolicy/algorithms/r_mappo/r_mappo.py", line 106, in ppo_update
    adv_targ, available_actions_batch = sample
ValueError: too many values to unpack (expected 12)

the script being used is the following:

#!/bin/sh
env="MPE"
scenario="simple_speaker_listener"
num_landmarks=3
num_agents=2
algo="rmappo" #"mappo" "ippo"
exp="check"
seed_max=1

echo "env is ${env}, scenario is ${scenario}, algo is ${algo}, exp is ${exp}, max seed is ${seed_max}"
for seed in `seq ${seed_max}`;
do
    echo "seed is ${seed}:"
    CUDA_VISIBLE_DEVICES=0 python ../train/train_mpe.py --env_name ${env} --algorithm_name ${algo} --experiment_name ${exp} \
    --scenario_name ${scenario} --num_agents ${num_agents} --num_landmarks ${num_landmarks} --seed ${seed} \
    --n_training_threads 1 --n_rollout_threads 128 --num_mini_batch 1 --episode_length 25 --num_env_steps 2000000 \
    --ppo_epoch 15 --gain 0.01 --lr 7e-4 --critic_lr 7e-4 --use_wandb 0 --wandb_name "xxx" --user_name "yuchao" --share_policy
done
Mendes-Gao commented 2 weeks ago

我也遇到了同样的问题,期待作者解答

zoeyuchao commented 4 days ago

Hi all, we have fixed the bug, try the new code now.