oxwhirl / smacv2

MIT License
197 stars 30 forks source link

Failed to reproduce the result. #18

Closed demo111demo closed 1 year ago

demo111demo commented 1 year ago

First of all, thank you very much for your contribution. I tried to use the code from: https://github.com/benellis3/pymarl2 to reproduce the results in the paper with following commond: python3 src/main.py --config=qmix --env-config=sc2_gen_protoss with env_args.capability_config.n_units=5 env_args.capability_config.start_positions.n_enemies=5

However, in protoss 5 vs_5, after running 10M steps, the final test win rate is ~20%, The winning rate of protoss10 vs_10 for 10M steps is ~15%, which is far lower than the results in the paper. Could you please tell me if I missed any details? Or am I using Open Loop mode?

demo111demo commented 1 year ago

Would really appreciate that if you'd take a look to see if I made the wrong settings. @benellis3 Here is the more detail config I run :

{"action_selector": "epsilon_greedy", "agent": "n_rnn", "agent_output_type": "q", "batch_size": 128, "batch_size_run": 4, "buffer_cpu_only": true, "buffer_size": 5000, "checkpoint_path": "", "critic_lr": 0.0005, "env": "sc2wrapped", "env_args": { "capability_config": { "n_units": 5, "start_positions": { "dist_type": "surrounded_and_reflect", "map_x": 32, "map_y": 32, "n_enemies": 5, "p": 0.5 }, "team_gen": { "dist_type": "weighted_teams", "observe": true, "unit_types": [ "stalker", "zealot", "colossus" ], "weights": [ 0.45, 0.45, 0.1 ] } }, "conic_fov": true, "continuing_episode": false, "debug": false, "difficulty": "7", "game_version": null, "heuristic_ai": false, "map_name": "10gen_protoss", "move_amount": 2, "num_fov_actions": 12, "obs_all_health": true, "obs_instead_of_state": false, "obs_last_action": false, "obs_own_health": true, "obs_own_pos": true, "obs_pathing_grid": false, "obs_terrain_height": false, "obs_timestep_number": false, "replay_dir": "", "replay_prefix": "", "reward_death_value": 10, "reward_defeat": 0, "reward_negative_scale": 0.5, "reward_only_positive": true, "reward_scale": true, "reward_scale_rate": 20, "reward_sparse": false, "reward_win": 200, "state_last_action": true, "state_timestep_number": false, "step_mul": 8 }, "epsilon_anneal_time": 100000, "epsilon_finish": 0.05, "epsilon_start": 1.0, "evaluate": false, "gamma": 0.99, "grad_norm_clip": 10, "group": "smacv2", "hypernet_embed": 64, "label": "default_label", "learner": "nq_learner", "learner_log_interval": 2000, "load_step": 0, "local_results_path": "results", "log_interval": 2000, "lr": 0.001, "mac": "n_mac", "mixer": "qmix", "mixing_embed_dim": 32, "name": "qmix", "obs_agent_id": true, "obs_last_action": true, "optim_alpha": 0.99, "optim_eps": 1e-05, "optimizer": "adam", "project": "smacv2", "q_lambda": false, "repeat_id": 1, "rnn_hidden_dim": 64, "run": "default", "runner": "parallel", "runner_log_interval": 2000, "save_model": false, "save_model_interval": 2000000, "save_replay": false, "seed": 619904269, "t_max": 10050000, "target_update_interval": 200, "td_lambda": 0.4, "test_greedy": true, "test_interval": 10000, "test_nepisode": 32, "use_cuda": true, "use_tensorboard": false, "use_wandb": true }

benellis3 commented 1 year ago

Hi. Apologies this was my fault. The branch I used for the experiments in the paper was the ranges branch, not the master one. I have merged the ranges branch into master now so if you git pull and re-run your experiments hopefully you should be able to reproduce the results.

benellis3 commented 1 year ago

Let me know if you have further problems 😄

demo111demo commented 1 year ago

Thank you so much for your prompt fix! I will try it right away.