nsidn98 / InforMARL

Code for our paper: Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation
https://nsidn98.github.io/InforMARL/
MIT License
53 stars 12 forks source link

How do we render the training and evaluation? #6

Closed Michael-Elrod-dev closed 2 weeks ago

Michael-Elrod-dev commented 3 months ago

Hello @nsidn98 ,can you please explain how to run the program while rendering all training and testing episodes?

I have tried setting the flags --use_render "True" --render_eval "True".

nsidn98 commented 3 months ago

Hey! Do you want to render the episode using a trained policy?

Michael-Elrod-dev commented 3 months ago

Thank you for the quick reply. What I would like to do is render all the training episodes. Once training is completed and I have the trained policy I would like to render all of those episodes as well (likely the navigation_graph.py env). To train I am using the command line input provided in the README that uses the GraphMPE env.

nsidn98 commented 3 months ago

Gotcha! What are you getting when you set the flags --use_render "True" --render_eval "True"? Are you not getting a directory with all the GIFs stored?

Michael-Elrod-dev commented 3 months ago

So, I tried to run with the following command:

python -u onpolicy/scripts/train_mpe.py --use_valuenorm --use_popart --project_name "informarl" --env_name "GraphMPE" --algorithm_name "rmappo" --seed 0 --experiment_name "informarl" --scenario_name "navigation_graph" --num_agents 3 --collision_rew 5 --n_training_threads 1 --n_rollout_threads 128 --num_mini_batch 1 --episode_length 25 --num_env_steps 2000000 --ppo_epoch 10 --use_ReLU --gain 0.01 --lr 7e-4 --critic_lr 7e-4 --user_name "elrod-michael95" --use_cent_obs "False" --graph_feat_type "relative" --auto_mini_batch_size --target_mini_batch_size 128 --save_gifs "True" --use_render "True" --render_eval "True"

Note that I have set --save_gifs "True" --use_render "True" --render_eval "True"

After creating the wandboard and printing the network architecture I get the following error:

Traceback (most recent call last):
  File "onpolicy/scripts/train_mpe.py", line 315, in <module>
    main(sys.argv[1:])
  File "onpolicy/scripts/train_mpe.py", line 300, in main
    runner.run()
  File "C:\Users\elrod\OneDrive\Code\InforMARL - Copy\onpolicy\runner\shared\graph_mpe_runner.py", line 85, in run
    self.save()
  File "C:\Users\elrod\OneDrive\Code\InforMARL - Copy\onpolicy\runner\shared\base_runner.py", line 176, in save
    torch.save(policy_actor.state_dict(), str(self.save_dir) + "/actor.pt")
AttributeError: 'GMPERunner' object has no attribute 'save_dir'

Does this mean I cannot render while using wandb? the source of the error is from here:

        # if not testing model
        if not self.use_render: # HERE
            if self.use_wandb:
                self.save_dir = str(wandb.run.dir)
                self.run_dir = str(wandb.run.dir)
            else:
                self.run_dir = config["run_dir"]
                self.log_dir = str(self.run_dir / "logs")
                if not os.path.exists(self.log_dir):
                    os.makedirs(self.log_dir)
                self.writter = SummaryWriter(self.log_dir)
                self.save_dir = str(self.run_dir / "models")
                if not os.path.exists(self.save_dir):
                    os.makedirs(self.save_dir)
nsidn98 commented 3 months ago

Yes, we do not render while training because rendering images/frames takes a lot of time which slows down the training process (and also because we use multiple rollout threads for the environment). But you can use it while testing/evaluating.

Michael-Elrod-dev commented 3 months ago

Ah I see, thank you for clarifying. In that case do you have an example of the command line arguments needed for testing and rendering the trained policy using the navigation_graph.py environment? I am assuming this will be done through the main function in eval_mpe.py?

nsidn98 commented 3 months ago

Something like this (with appropriate changes in --model_dir):

python onpolicy/scripts/eval_mpe.py \
    --model_dir='my_saved_wt_dir_path' \
    --render_episodes=100 --world_size=3 --num_agents=3 --num_obstacles=0 --seed=2 --num_landmarks=3 --episode_length=25 \
    --use_dones=False --collaborative=False \
    --scenario_name='navigation_graph' --goal_rew=10 --fair_rew=5 --save_gifs --use_render --num_walls=0
nsidn98 commented 3 months ago

Can you check your yaml version and let me know what it is (python -c 'import yaml; print(yaml.__version__))?

Also, can you show me the folder structure of your model_dir?

Michael-Elrod-dev commented 3 months ago

So, my version of pyyaml = 6.0.1 I was able to bypass the yaml error by adding a Loader parameter but of course I am not sure if that is appropriate ydict = yaml.load(f, Loader=yaml.SafeLoader)

After this I ran the program with the following input: python onpolicy/scripts/eval_mpe.py --model_dir='onpolicy/results/GraphMPE/navigation_graph/rmappo/informarl/wandb/run-20240305_132329-m5gd1v8e/files' --render_episodes=100 --world_size=3 --num_agents=3 --num_obstacles=0 --seed=2 --num_landmarks=3 --episode_length=25 --use_dones=False --collaborative=False --scenario_name='navigation_graph' --goal_rew=10 --fair_rew=5 --save_gifs --use_render --num_walls=0

and no errors are produced however the program only runs for a few moments and then prints out 4 separate lists to the terminal that look like:

[-61.12262167799693, -71.25237291446483, -139.65972281108307, -126....
[1.0, 0.88, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0....
[False, False, False, False, False, False, False....
[0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....

My files structure is roughly identical to to the one provided in the repo:

image

nsidn98 commented 3 months ago

Did it not save any GIFs in that folder? Ideally it should if you were able to run the code to termination without any errors for 100 episodes.

Michael-Elrod-dev commented 3 months ago

After running the program with the input shown here: python onpolicy/scripts/eval_mpe.py --model_dir='onpolicy/results/GraphMPE/navigation_graph/rmappo/informarl/wandb/run-20240305_132329-m5gd1v8e/files' --render_episodes=100 --world_size=3 --num_agents=3 --num_obstacles=0 --seed=2 --num_landmarks=3 --episode_length=25 --use_dones=False --collaborative=False --scenario_name='navigation_graph' --goal_rew=10 --fair_rew=5 --save_gifs --use_render --num_walls=0

There are no .gif files to be found in the project directory and the program prints the following to the terminal without errors:

__________________________________________________
Using model_dir = onpolicy/results/GraphMPE/navigation_graph/rmappo/informarl/wandb/run-20240305_132329-m5gd1v8e/files
Using num_agents = 3
Using num_landmarks = 3
Using num_obstacles = 0
Using render_episodes = 100
Using seed = 2
Using world_size = 3.0
__________________________________________________
<onpolicy.envs.env_wrappers.GraphDummyVecEnv object at 0x00000237AE552D60>
Overriding Observation dimension
Overriding Observation dimension
Restoring from checkpoint stored in onpolicy/results/GraphMPE/navigation_graph/rmappo/informarl/wandb/run-20240305_132329-m5gd1v8e/files
C:\Users\elrod\anaconda3\envs\OneMore\lib\site-packages\torch_geometric\deprecation.py:12: UserWarning: 'data.DataLoader' is deprecated, use 'loader.DataLoader' instead
  warnings.warn(out)
[-61.12262167799693, -71.25237291446483, -139.65972281108307, -126.43046475095089, -117.63275586215859, -57.04380748916634, -232.64036143038302, -92.43584630682903, -151.94816002613322, -99.3599009013733, -72.42360793378744, -86.24318523420767, -122.92327068352945, -97.81961511443133, -91.73460503571124, -86.94983151310464, -83.33720656715961, -136.4803940504792, -127.88219884943236, -91.95317200720973, -136.55174813085952, -52.034000835382095, -211.67891209487223, -152.3154138186259, -126.95701503158982, -79.75732169736919, -133.5423368624323, -114.88915575713725, -57.2810846687639, -150.76213944522055, -119.34410136729832, -75.81306786544391, -116.04915215501809, -127.88830271082088, -109.9612420008544, -183.90089676204033, -125.00132897512758, -115.32362676602266, -118.90595837415712, -106.1862875948941, -52.17172539505564, -127.56497712980371, -111.38674101088395, -119.213542370468, -78.94857381141237, -145.7116405570003, -68.05248438207639, -84.65764604662057, -132.43159151292593, -74.73693933927339, -169.41764988955168, -138.08757843889993, -113.74243458240964, -99.69838452652131, -122.77922974214754, -88.59998264185974, -182.03954224147614, -113.94546359753302, -120.67303720412288, -103.54838042587515, -102.73559771168182, -75.65335636053128, -125.65505249354489, -193.53023512822392, -170.71257289570266, -99.17807127121223, -107.32804708166587, -47.24203618799539, -232.52005658574703, -182.37103477341773, -107.84076859426979, -91.03363216564789, -174.48574828365267, -63.113043477777516, -134.82459957788825, -116.45652185579469, -275.96990507255254, -78.32512823096943, -87.50001858350898, -215.2875332805917, -101.27672094809884, -58.44111396886604, -52.6314875854217, -184.96012540282882, -66.03617596987138, -104.38190459645695, -173.41129387441404, -116.72911523105647, -53.03281917769047, -138.28595508618818, -128.17096768519139, -163.1478009330355, -94.8247339377099, -135.20311222281032, -124.62862254815542, -105.6497315938305, -90.48346641936712, -83.43314476804808, -149.62393171241376, -67.83984543831114]
[1.0, 0.88, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.8133333333333334, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.7600000000000001, 1.0, 1.0, 1.0, 0.7733333333333333, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9199999999999999, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.84, 1.0, 1.0, 1.0, 1.0, 1.0, 0.7600000000000001, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.84, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9333333333333332]
[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False]
[0.0, 2.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 4.0, 0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
nsidn98 commented 3 months ago

Hmmm, that is weird. Ideally, it should save the gif in the same folder as the model_dir.

Relevant code is here: https://github.com/nsidn98/InforMARL/blob/304e905d05b34d9bf06046eb7e03904b97a14231/onpolicy/runner/shared/graph_mpe_runner.py#L470C6-L476C18

Michael-Elrod-dev commented 3 months ago

Ah thank you for pointing that out. The issue was that when we call this render function it was setup to send True as the get_metrics parameter which would bypass the rendering shown here: https://github.com/nsidn98/InforMARL/blob/304e905d05b34d9bf06046eb7e03904b97a14231/onpolicy/scripts/eval_mpe.py#L222C5-L222C24

Thank you for your help.

Yu-zx commented 2 weeks ago

I would like to ask what such a ymal file should look like. I tried but unfortunately failed.

nsidn98 commented 2 weeks ago

The yaml file is automatically generated with wandb during training. The yaml file looks something like this:

wandb_version: 1

_wandb:
  desc: null
  value:
    cli_version: 0.10.31
    code_path: code/onpolicy/scripts/train_mpe.py
    framework: huggingface
    is_jupyter_run: false
    is_kaggle_kernel: false
    python_version: 3.8.13
    t:
      1:
      - 1
      2:
      - 1
      - 9
      - 11
      3:
      - 2
      - 4
      4: 3.8.13
      5: 0.10.31
      8:
      - 5
actor_graph_aggr:
  desc: null
  value: node
algorithm_name:
  desc: null
  value: rmappo
auto_mini_batch_size:
  desc: null
  value: true
clip_param:
  desc: null
  value: 0.2
collaborative:
  desc: null
  value: true
collision_rew:
  desc: null
  value: 5.0
critic_graph_aggr:
  desc: null
  value: global
critic_lr:
  desc: null
  value: 0.0007
cuda:
  desc: null
  value: true
cuda_deterministic:
  desc: null
  value: true
data_chunk_length:
  desc: null
  value: 10
embed_add_self_loop:
  desc: null
  value: false
embed_hidden_size:
  desc: null
  value: 16
embed_layer_N:
  desc: null
  value: 1
embed_use_ReLU:
  desc: null
  value: true
embedding_size:
  desc: null
  value: 2
entropy_coef:
  desc: null
  value: 0.01
env_name:
  desc: null
  value: GraphMPE
episode_length:
  desc: null
  value: 25
eval_episodes:
  desc: null
  value: 32
eval_interval:
  desc: null
  value: 25
experiment_name:
  desc: null
  value: informarl_False_relative
gae_lambda:
  desc: null
  value: 0.95
gain:
  desc: null
  value: 0.01
gamma:
  desc: null
  value: 0.99
global_aggr_type:
  desc: null
  value: mean
gnn_concat_heads:
  desc: null
  value: false
gnn_hidden_size:
  desc: null
  value: 16
gnn_layer_N:
  desc: null
  value: 2
gnn_num_heads:
  desc: null
  value: 3
gnn_use_ReLU:
  desc: null
  value: true
goal_rew:
  desc: null
  value: 5
graph_feat_type:
  desc: null
  value: relative
hidden_size:
  desc: null
  value: 64
huber_delta:
  desc: null
  value: 10.0
ifi:
  desc: null
  value: 0.1
layer_N:
  desc: null
  value: 1
log_interval:
  desc: null
  value: 5
lr:
  desc: null
  value: 0.0007
max_batch_size:
  desc: null
  value: 32
max_edge_dist:
  desc: null
  value: 1
max_grad_norm:
  desc: null
  value: 10.0
max_speed:
  desc: null
  value: 2
min_dist_thresh:
  desc: null
  value: 0.05
model_dir:
  desc: null
  value: null
n_eval_rollout_threads:
  desc: null
  value: 1
n_render_rollout_threads:
  desc: null
  value: 1
n_rollout_threads:
  desc: null
  value: 128
n_training_threads:
  desc: null
  value: 1
num_agents:
  desc: null
  value: 3
num_embeddings:
  desc: null
  value: 3
num_env_steps:
  desc: null
  value: 2000000
num_landmarks:
  desc: null
  value: 3
num_mini_batch:
  desc: null
  value: 75
num_nbd_entities:
  desc: null
  value: 3
num_obstacles:
  desc: null
  value: 3
num_scripted_agents:
  desc: null
  value: 0
obs_type:
  desc: null
  value: global
opti_eps:
  desc: null
  value: 1.0e-05
ppo_epoch:
  desc: null
  value: 10
project_name:
  desc: null
  value: compare_3
recurrent_N:
  desc: null
  value: 1
render_episodes:
  desc: null
  value: 5
save_gifs:
  desc: null
  value: false
save_interval:
  desc: null
  value: 1
scenario_name:
  desc: null
  value: navigation_graph
seed:
  desc: null
  value: 3
share_policy:
  desc: null
  value: true
split_batch:
  desc: null
  value: false
stacked_frames:
  desc: null
  value: 1
target_mini_batch_size:
  desc: null
  value: 128
use_ReLU:
  desc: null
  value: false
use_cent_obs:
  desc: null
  value: false
use_centralized_V:
  desc: null
  value: true
use_clipped_value_loss:
  desc: null
  value: true
use_comm:
  desc: null
  value: false
use_dones:
  desc: null
  value: false
use_eval:
  desc: null
  value: false
use_feature_normalization:
  desc: null
  value: true
use_gae:
  desc: null
  value: true
use_huber_loss:
  desc: null
  value: true
use_linear_lr_decay:
  desc: null
  value: false
use_max_grad_norm:
  desc: null
  value: true
use_naive_recurrent_policy:
  desc: null
  value: false
use_obs_instead_of_state:
  desc: null
  value: false
use_orthogonal:
  desc: null
  value: true
use_policy_active_masks:
  desc: null
  value: true
use_popart:
  desc: null
  value: true
use_proper_time_limits:
  desc: null
  value: false
use_recurrent_policy:
  desc: null
  value: true
use_render:
  desc: null
  value: false
use_stacked_frames:
  desc: null
  value: false
use_value_active_masks:
  desc: null
  value: true
use_valuenorm:
  desc: null
  value: false
use_wandb:
  desc: null
  value: true
user_name:
  desc: null
  value: marl
value_loss_coef:
  desc: null
  value: 1
verbose:
  desc: null
  value: true
weight_decay:
  desc: null
  value: 0
world_size:
  desc: null
  value: 2

@Yu-zx can you show me what command you tried executing and what is the error you got.

Yu-zx commented 2 weeks ago

Thanks!

nsidn98 commented 2 weeks ago

@Yu-zx closing this issue as it seems to be resolved. Please open a new issue if you are still facing issues.