I was trying to reproduce your results, however whenever I try to run the script about simple_speaker_listener, it crashes for a shape mismatch (Currently trying your last commit, but also on old commits this error happens)
In particular, the error is the following:
Traceback (most recent call last):
File "../train/train_mpe.py", line 175, in <module>
main(sys.argv[1:])
File "../train/train_mpe.py", line 160, in main
runner.run()
File ".../on-policy/onpolicy/runner/separated/mpe_runner.py", line 45, in run
train_infos = self.train()
File ".../on-policy/onpolicy/runner/separated/base_runner.py", line 162, in train
train_info = self.trainer[agent_id].train(self.buffer[agent_id])
File ".../on-policy/onpolicy/algorithms/r_mappo/r_mappo.py", line 207, in train
= self.ppo_update(sample, update_actor)
File ".../on-policy/onpolicy/algorithms/r_mappo/r_mappo.py", line 106, in ppo_update
adv_targ, available_actions_batch = sample
ValueError: too many values to unpack (expected 12)
the script being used is the following:
#!/bin/sh
env="MPE"
scenario="simple_speaker_listener"
num_landmarks=3
num_agents=2
algo="rmappo" #"mappo" "ippo"
exp="check"
seed_max=1
echo "env is ${env}, scenario is ${scenario}, algo is ${algo}, exp is ${exp}, max seed is ${seed_max}"
for seed in `seq ${seed_max}`;
do
echo "seed is ${seed}:"
CUDA_VISIBLE_DEVICES=0 python ../train/train_mpe.py --env_name ${env} --algorithm_name ${algo} --experiment_name ${exp} \
--scenario_name ${scenario} --num_agents ${num_agents} --num_landmarks ${num_landmarks} --seed ${seed} \
--n_training_threads 1 --n_rollout_threads 128 --num_mini_batch 1 --episode_length 25 --num_env_steps 2000000 \
--ppo_epoch 15 --gain 0.01 --lr 7e-4 --critic_lr 7e-4 --use_wandb 0 --wandb_name "xxx" --user_name "yuchao" --share_policy
done
I was trying to reproduce your results, however whenever I try to run the script about
simple_speaker_listener
, it crashes for a shape mismatch (Currently trying your last commit, but also on old commits this error happens)In particular, the error is the following:
the script being used is the following: