Non-distributed training

jidiai / GRF_MARL

Google Research Football MARL Benchmark and Research Toolkit

https://grf-marl.readthedocs.io/

Other

27 stars 4 forks source link

Non-distributed training #5

Open GP413413 opened 1 month ago

GP413413 commented 1 month ago

Hello! I found that the default setting for this project is distributed training, using the ray framework. I want to know if GRF_MARL can support non-distributed training (only on one server) and where to modify the settings. Thank you for your help!

YanSong97 commented 1 month ago

Hi, you can modify the config file for customized rollout and training. For example, ${rollout_manager.num_workers} is the CPU cores for environment rollout (one env instance on each CPU by default), and ${training_manager.num_trainers} is equal to the number of GPUs used for training.

Also feel free to look at a ray-free demo: light_malib/scripts/run_train.py .

GP413413 commented 1 month ago

Thank you for your advice! Do you mean training with light_malib/scripts/run_train.py is another way of light_malib/main_pbt.py without using ray framework? I have tried to run an experiment by: python3 light_malib/scripts/run_train.py --config expr_configs/cooperative_MARL_benchmark/full_game/11_vs_11_hard/ippo.yaml But I got the following results immediately:

Where shoud I modify the config file?

YanSong97 commented 1 month ago

Yes, this is a way to debug without any distributed execution.

The config file is located at expr_configs/cooperative_MARL_benchmark/full_game/11_vs_11_hard/ippo.yaml. Can try modifying ${rollout_manager.num_workers} and ${training_manager.num_trainers} under distributed execution mode (e.g. running main_pbt.py)

GP413413 commented 1 month ago

Ok, thank you for your advice. I will have a try.