openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
https://arxiv.org/pdf/1706.02275.pdf
MIT License
1.57k stars 484 forks source link

How can i use it for "simple_world_comm" in MPE? ---- "AssertionError: nvec should be a 1d array (or list) of ints" #10

Closed zimoqingfeng closed 6 years ago

zimoqingfeng commented 6 years ago

I am putting my focus on the implement of the "maddpg" but an error occurring.

The pycharm showed: Traceback (most recent call last): File "/home/zimoqingfeng/rlSource/maddpg/experiments/train.py", line 195, in train(arglist) File "/home/zimoqingfeng/rlSource/maddpg/experiments/train.py", line 83, in train env = make_env(arglist.scenario, arglist, arglist.benchmark) File "/home/zimoqingfeng/rlSource/maddpg/experiments/train.py", line 62, in make_env env = MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation) File "/home/zimoqingfeng/rlSource/multiagent-particle-envs/multiagent/environment.py", line 60, in init act_space = spaces.MultiDiscrete([[0,act_space.n-1] for act_space in total_action_space]) File "/home/zimoqingfeng/rlSource/gym/gym/spaces/multi_discrete.py", line 10, in init assert self.nvec.ndim == 1, 'nvec should be a 1d array (or list) of ints' AssertionError: nvec should be a 1d array (or list) of ints

For your information, if you need. def parse_args(): parser = argparse.ArgumentParser("Reinforcement Learning experiments for multiagent environments")

Environment

parser.add_argument("--scenario", type=str, default="simple_world_comm", help="name of the scenario script")
parser.add_argument("--max-episode-len", type=int, default=25, help="maximum episode length")
parser.add_argument("--num-episodes", type=int, default=60000, help="number of episodes")
parser.add_argument("--num-adversaries", type=int, default=0, help="number of adversaries")
parser.add_argument("--good-policy", type=str, default="maddpg", help="policy for good agents")
parser.add_argument("--adv-policy", type=str, default="maddpg", help="policy of adversaries")
# Core training parameters
parser.add_argument("--lr", type=float, default=1e-2, help="learning rate for Adam optimizer")
parser.add_argument("--gamma", type=float, default=0.95, help="discount factor")
parser.add_argument("--batch-size", type=int, default=1024, help="number of episodes to optimize at the same time")
parser.add_argument("--num-units", type=int, default=64, help="number of units in the mlp")
# Checkpointing
parser.add_argument("--exp-name", type=str, default= "001", help="name of the experiment")
parser.add_argument("--save-dir", type=str, default="./tmp/policy_simple_world_comm/", help="directory in which training state and model should be saved")
parser.add_argument("--save-rate", type=int, default=1000, help="save model once every time this many episodes are completed")
parser.add_argument("--load-dir", type=str, default="", help="directory in which training state and model are loaded")
# Evaluation
parser.add_argument("--restore", action="store_true", default=False)
parser.add_argument("--display", action="store_true", default=False)
parser.add_argument("--benchmark", action="store_true", default=False)
parser.add_argument("--benchmark-iters", type=int, default=100000, help="number of iterations run for benchmarking")
parser.add_argument("--benchmark-dir", type=str, default="./benchmark_files/", help="directory where benchmark data is saved")
parser.add_argument("--plots-dir", type=str, default="./learning_curves/", help="directory where plot data is saved")
return parser.parse_args()

Thank you anyway, and look forward to your reply!

zimoqingfeng commented 6 years ago

Solved the problem with an older version of MPE and an older version of gym (version < 0.1.0). FYI,the version of tensorflow could be chosen optionally.