Closed annakrystalli closed 1 year ago
Noting that I think we've said these files would have the more specific .yml
or .yaml
file extension
The last item on our scoping list ("There is only one "primary"-designated model for a given team") is outdated — it says that we want to check that there is only one primary model per team, but we’ve said we would do away with the generic/nebulous primary/secondary designations in favor of more function-specific designations:
include_viz:
description: >
Indicator for whether the model should be included in the
Hub's visualization
type: boolean
include_ensemble:
description: >
Indicator for whether the model should be included in the
Hub's ensemble
type: boolean
include_eval:
description: >
Indicator for whether the model should be scored for inclusion in the
Hub's evaluations
type: boolean[aw][ax][ay]
Q: do we want to have any default checks on the number of models per team that are included in the evaluation, viz, or ensemble? Or hub-specific config settings to specify this?
three questions about versioning metadata schemas:
hub-config/model-metadata-schema-v0.0.1.json
? would we recommend that older versions be preserved in case of any updates?unless we see a clear use-case for it right now, I would suggest starting simple and not versioning the metadata schema.
Again, I'd suggest maybe starting simple and not including default checks for number of models per team, but we could always add them later.
On team designation, number of models etc. I wonder if this question should be taken out of the metadata altogether. I would imagine that in the future hubs might have different criteria for inclusion (based on past performance etc.) and scoring as well as a specific onboarding processes. Ultimately I think any decision on whether a model is scored and/or included in the ensemble should be down to the hub maintainers rather than individual teams.
On versioning I agree that we should keep things simple, however I think it would be helpful to be able to keep track of any changes made to a model (where soliciting and providing a platform for code submissions rather than results submissions would be one potential way of ensuring this).
r.e. team designations -- I think there was a need for this in the US covid forecast hub (we had a team submitting ~7 variations on the same model for a while, and we wanted them to pick one), but this also seems potentially specific to that situation, and other hubs might want to handle it differently. So it makes sense to me to keep the checks that are done by default fairly limited, and then allow hubs to add to that if they want. So the proposal is that our tools will not say what a hub has to collect in their model metadata files, we will just check that the metadata file exists and matches whatever was in the hub's model-metadata-schema.json
.
And with that in place, we could allow hubs to do whatever they want in their metadata files to track changes to model methods (for example, just relying on github file version history or adding in some metadata structure that allows for per-round model details)
Overview
This function will check correctness of model metadata files. The checklist was compiled from the compiled checks spreadsheet across existing hubs and may be superceded by current hubverse practices (and therefore require some updating). I thing the core of required functionality is there though.
Each of the following checks requires it's own
check_meta_*()
function that returns the output ofcapture_check_cnd()
.yml
or.yaml
extensionmodel_id