nsidn98 / InforMARL

Code for our paper: Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation
https://nsidn98.github.io/InforMARL/
MIT License
53 stars 12 forks source link

Scalablity of InforMARL #14

Closed QMBX closed 1 month ago

QMBX commented 1 month ago

Hello @nsidn98 , Thank you for open-source the paper code. When I tried to run the program to reproduce the experimental results in the paper, I couldn't find part of the experimental code for Scalablity. The setting of the experiment is mentioned in the paper:

The number of obstacles in the environment is randomly chosen from (0, 10) at the beginning of the episode.

But I can't find anything in the code about random obstacles Can you tell me how to set it up, or what parts of the code I should change? Thank you

nsidn98 commented 1 month ago

Hi @QMBX,

You can use the --num_obstacles flag.

We created a test submission bash script where we randomly chose the number of obstacles.

python -u onpolicy/scripts/train_mpe.py --use_valuenorm --use_popart \
--project_name "informarl" \
--env_name "GraphMPE" \
--algorithm_name "rmappo" \
--seed 0 \
--experiment_name "informarl" \
--scenario_name "navigation_graph" \
--num_agents 3 \
--num_obstacles $((1 + $RANDOM % 10))
--collision_rew 5 \
--n_training_threads 1 --n_rollout_threads 128 \
--num_mini_batch 1 \
--episode_length 25 \
--num_env_steps 2000000 \
--ppo_epoch 10 --use_ReLU --gain 0.01 --lr 7e-4 --critic_lr 7e-4 \
--user_name "marl" \
--use_cent_obs "False" \
--graph_feat_type "relative" \
--auto_mini_batch_size --target_mini_batch_size 128

I hope this answers your question.

Thanks! Sid

QMBX commented 1 month ago

Hi @QMBX,

You can use the --num_obstacles flag.

We created a test submission bash script where we randomly chose the number of obstacles.

python -u onpolicy/scripts/train_mpe.py --use_valuenorm --use_popart \
--project_name "informarl" \
--env_name "GraphMPE" \
--algorithm_name "rmappo" \
--seed 0 \
--experiment_name "informarl" \
--scenario_name "navigation_graph" \
--num_agents 3 \
--num_obstacles $((1 + $RANDOM % 10))
--collision_rew 5 \
--n_training_threads 1 --n_rollout_threads 128 \
--num_mini_batch 1 \
--episode_length 25 \
--num_env_steps 2000000 \
--ppo_epoch 10 --use_ReLU --gain 0.01 --lr 7e-4 --critic_lr 7e-4 \
--user_name "marl" \
--use_cent_obs "False" \
--graph_feat_type "relative" \
--auto_mini_batch_size --target_mini_batch_size 128

I hope this answers your question.

Thanks! Sid

Thank you very much for your reply. I'm sorry to see your reply one day late. So when I try to reproduce the table in the Scalability of inforMARL section, do I need to run multiple experiments with the same number of agents, with different --num_obstacles Settings? And then evaluate these models and take an average? In addition, is it also the case that different '--num_obstacles' parameters are used in the assessment?

The table of Scalability of inforMARL refers to the following table image

nsidn98 commented 1 month ago

Oh, you meant those experiments! You don't need to train separate policies for different number of agents and obstacles. You can just train on $n$ agents and $d$ obstacles and test on $m\neq n$ agents and $f\neq d$ obstacles. You can execute the test script with:

python onpolicy/scripts/eval_mpe.py \
--model_dir=<add_file_path_to_saved_weights_folder> \
--render_episodes=1 \
--num_agents=3 \
--num_obstacles=$((1 + $RANDOM % 10)) \
--seed=1 \
--episode_length=25 \
--use_dones=False --collaborative=True \
--scenario_name='navigation_graph' --save_gifs --use_render

I hope this answers your question!

QMBX commented 1 month ago

So that's it. Before this, I treated the setting of 0-10 obstacles as an environment setting during the training phase

Thank you very much for your help

nsidn98 commented 1 month ago

Glad that it got resolved. Closing this issue now. Please re-open if the issue persists.

Thanks, Sid