Closed moriartyjm closed 2 years ago
Some suggestions to get us started:
Quantitative criteria
General phase: Participants submit an agent, which can be constructed by any means, to the EvalAI platform. The platform will average the agent's performance in the Pathways to Net Zero competition environment over an unseen, fixed set of 1000 random seeds.
Lean RL phase: Participants will be invited to submit RL training code. The code must
(i) be contained in a single text file named train.py
not exceeding 50kb
(ii) be executable by Python3
within 1 hour on a standard Linux virtual machine (Azure logins will be provided) on which the Pathways to Net Zero environment is installed (the VM will not be able to communicate externally).
For the top 10 teams in Phase 1, the resulting trained agent will be submitted to EvalAI by the RangL team and averaged the same set of 1000 random seeds used in Phase 1.
Qualitative criteria
Each team will be required to submit an executive summary of their entry (as a .md
document of maximum 1 A4 page in length at 11 point type). The executive summary should be written in narrative form and address the following points in a manner accessible to the general scientific reader:
I really like this @moriartyjm. I might just simplify the qualitative criteria as follows (merely a suggestion):
We are particularly interested in improving our standard IEV models. If you have any suggestions for enhancement of any of these models then please feel free to comment. (This element is completely optional.)
the mix of offshore wind, blue and green hydrogen that you found to be optimal;
why your optimal pathway performs better than the standard IEV models Breeze, Gale and Storm; and
in what ways your approach to constructing an agent improves upon naive RL training?
Many thanks @r-saldanha, I'd be happy to go with these three questions (plus the optional part).
Interestingly, we found that training was significantly more difficult in the closed-loop problem (ie when the noise is observed step by step) than in the open-loop (when noise is not observed). (In recent testing, the learning curve was actually negatively sloped for the closed-loop version.) The flip-side of this is that optimal performance should be better in the closed-loop version as it has more information available when deciding its actions.
@jia-chenhua has added a switch to the env.py
which toggles between open and closed loop. It would be very easy, and also interesting, to have separate open-loop and closed-loop phases to the competition.
If the negative learning curve is generally true across several RL algorithms, the closed-loop problem might serve as an alternative to the lean-RL phase (ie a problem where brute force shouldn't win)
This issue develops ideas for competition evaluation, both qualitative and quantitative.