Hierarchical reconciliation

We have a competition with independent models providing predictions as two different levels of aggregation. It would be interesting to see how well the (best) country-level predictions compare to the (best) grid-level predictions aggregated to the country-level. Which models are best calibrated to the country-level aggregated outcomes (I would assume the country-level models), and what are the differences in calibration?

Adding methods to automatically cast grid-level predictions to the country-level would make it very easy to compare models across aggregation levels.

This means, however, that we would also need to (be able to) evaluate country-level models using only the subset of countries that are part of the grid-level analysis.

prio-data / prediction_competition_2023

Hierarchical reconciliation #20