Agree competition evaluation criteria (quantitative and qualitative)

moriartyjm commented 2 years ago

This issue develops ideas for competition evaluation, both qualitative and quantitative.

[x] Agree qualitative criteria
[x] Agree to have both open and closed-loop phases
[ ] Test learning curve in closed-loop problem across multiple RL algorithms

moriartyjm commented 2 years ago

Some suggestions to get us started:

Quantitative criteria

General phase: Participants submit an agent, which can be constructed by any means, to the EvalAI platform. The platform will average the agent's performance in the Pathways to Net Zero competition environment over an unseen, fixed set of 1000 random seeds.
Lean RL phase: Participants will be invited to submit RL training code. The code must (i) be contained in a single text file named train.py not exceeding 50kb (ii) be executable by Python3 within 1 hour on a standard Linux virtual machine (Azure logins will be provided) on which the Pathways to Net Zero environment is installed (the VM will not be able to communicate externally). For the top 10 teams in Phase 1, the resulting trained agent will be submitted to EvalAI by the RangL team and averaged the same set of 1000 random seeds used in Phase 1.

Qualitative criteria

Each team will be required to submit an executive summary of their entry (as a .md document of maximum 1 A4 page in length at 11 point type). The executive summary should be written in narrative form and address the following points in a manner accessible to the general scientific reader:

Please describe in simple qualitative terms the mix of offshore wind, blue and green hydrogen which you found to be optimal
How does your optimal pathway compare to the IEV scenarios Breeze, Gale and Storm?
Can you explain why your optimal pathway performs better than Breeze, Gale and Storm?
Did you find anything surprising about the IEV model, and do you have any suggestions to enhance it?
Did your approach to constructing an agent improve upon naive RL training (that is, applying standard RL algorithms to the standard competition environment) in any way? If so, how did you achieve this?

r-saldanha commented 2 years ago

I really like this @moriartyjm. I might just simplify the qualitative criteria as follows (merely a suggestion):

the mix of offshore wind, blue and green hydrogen that you found to be optimal;
why your optimal pathway performs better than the standard IEV models Breeze, Gale and Storm; and
in what ways your approach to constructing an agent improves upon naive RL training?

We are particularly interested in improving our standard IEV models. If you have any suggestions for enhancement of any of these models then please feel free to comment. (This element is completely optional.)

moriartyjm commented 2 years ago

the mix of offshore wind, blue and green hydrogen that you found to be optimal;

why your optimal pathway performs better than the standard IEV models Breeze, Gale and Storm; and

in what ways your approach to constructing an agent improves upon naive RL training?

Many thanks @r-saldanha, I'd be happy to go with these three questions (plus the optional part).

moriartyjm commented 2 years ago

Interestingly, we found that training was significantly more difficult in the closed-loop problem (ie when the noise is observed step by step) than in the open-loop (when noise is not observed). (In recent testing, the learning curve was actually negatively sloped for the closed-loop version.) The flip-side of this is that optimal performance should be better in the closed-loop version as it has more information available when deciding its actions.

@jia-chenhua has added a switch to the env.py which toggles between open and closed loop. It would be very easy, and also interesting, to have separate open-loop and closed-loop phases to the competition.

If the negative learning curve is generally true across several RL algorithms, the closed-loop problem might serve as an alternative to the lean-RL phase (ie a problem where brute force shouldn't win)

rangl-labs / netzerotc

Agree competition evaluation criteria (quantitative and qualitative) #66