We will want a script that can run (ideally automatically, via CI) and take all of the model_output files for a given week, and build an ensemble for them.
The script could live in the src folder, or it could live external to this repo.
The script should generate the file and save it in an appropriate hub-ensemble folder or some such place. I will file an issue to create the CI to "submit" the file as a separate issue.
We decided that this ensemble will be a linear pool:
for mean predictions, submit the mean of the means (and for any team that didn't submit means, extract means from the submitted samples)
for sample predictions, from each of the M contributing models choose 100 / M samples at random, randomly distributing any remainder in the number of samples across the models
Misc. other ideas for later analyses:
what if we randomly selected the samples rather than stratifying by model? our guess is that this will have larger MC variability
what if we repeat the random selection multiple times? what is variability in ensemble score? note, we could also bootstrap individual model samples to try to get at this
We will want a script that can run (ideally automatically, via CI) and take all of the model_output files for a given week, and build an ensemble for them.
The script could live in the
src
folder, or it could live external to this repo.The script should generate the file and save it in an appropriate
hub-ensemble
folder or some such place. I will file an issue to create the CI to "submit" the file as a separate issue.