moratodpg / imp_marl

IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL
Apache License 2.0
35 stars 5 forks source link

Reproduce questions #84

Open LXXXXR opened 2 months ago

LXXXXR commented 2 months ago

Hi,

Thank you for this awesome repository and the extremely useful wrappers.

I'm trying to reproduce the results for QMIX in the offshore wind farm (n=50) and encountered some difficulties. I would be very grateful for your assistance.

Should I set n_owt = 25 for testing and n_comp=25 for training?

Thank you very much for your time and assistance!

PaLeroy commented 2 months ago

Hey! Thanks for your interest.

The number of wind turbines is not the number of agents because we have 2 agents per wind turbine. I understand it can be confusing, sorry for that. For structural envrionement, n_comp=n but for offshore wind farm, it is n_comp=n/2 if n is the number of agents.

Indeed, n_owf is set based on the n_comp parameter, if you look in the pymarl_wrapper , we have n_owt: n_comp.

In the reproduce README, you can find all config files names (located here) used for our experiments.

Hope this helps.

LXXXXR commented 2 months ago

Thank you so much for your prompt reply. It helps a lot!

A quick follow up question, I'm trying to reproduce the results for QMIX in the offshore wind farm (n=100).

I ran the QMIX algorithm with epymarl using the provided wrapper and config, and obtained a return of approximately -1216. I ran the run_heuristics.py (with env = "owf", campaign_cost = False and n_owt = 50) after downloading the logs using download_heuristic_logs.sh, resulting in an output of Reward: -5846.345048636739.

It seems that if I use the H value in the paper(-2925.0), the normalized results match the reported one. How can I achieve H= -2925.0 as reported in the paper and in heur_read_results.ipynb?

PaLeroy commented 2 months ago

Something is indeed missing in the explanation of the evaluation part of the heuristic script.

Indeed you correctly changed the arguments of the owf_env. However, in the eval function, you need to specify the best inspection interval and number of components to be inspected.

The logs in the download_heuristic_logs.sh provides two folders. One provides the scores of the eval runs we have done, that you read in the notebook, but the other provides the best policies found by the search.

For example, with owf = 50 and no campaign cost, if you add a cell in the notebook, you can obtain:

with np.load("../heur_search/results_owf/heuristics_owf_50_3ref_2023_04_15_131100.npz", allow_pickle=True) as data:
    print(data["opt_heur"])

So you need to modify the eval function by specifying them and you should obtain similar results.

In this case,

else:
    # Evaluation
    insp_int = 5
    insp_comp = 100
    heuristic.eval(eval_size, insp_int, insp_comp)

We will make some PR to clarify this.

@moratodpg may correct me somewhere if I am wrong.

LXXXXR commented 2 months ago

Thank you very much! It's crystal clear now!! Really appreciate the help!!