hubverse-org / hubValidations

Testing framework for hubverse hub validations
https://hubverse-org.github.io/hubValidations/
Other
1 stars 3 forks source link

clearer check for model output or model metadata file deletions #43

Closed elray1 closed 7 months ago

elray1 commented 9 months ago

Currently, deleting an existing model output file results in validation failure, which I think we want. But the failure is kind of a byproduct of checking for whether the file exists. Maybe there is a clearer/more direct way to check whether a PR involves deletion of an existing model output or model metadata file?

image

annakrystalli commented 7 months ago

Working on this now. Will definitely flag deleted model-output files. Regarding model-metadata files, are we sure we are fussed about flagging whether a model metadata file is deleted? Given they exist to provide context to model output files, it feels right that missing model metadata files are checked as part of validating model output files.

For example say a team sets up a metadata file for a model they are planning to submit. In the end they decide to go with another model and therefore delete the previously submitted metadata file without ever having submitted a model out file. It seems wrong for such a PR to fail validation checks. Thoughts?

nickreich commented 7 months ago

I think saying that if a model-metadata file is deleted the validations should throw an error is a good baseline to start from. Agree that there are some more niche scenarios where maybe this could be ignored, as with the example above, but my sense is that the majority of situations where this would be relevant would be an inadvertent deletion of an existing file for a model with submissions, not a deleted file for a model that doesn't exist. We could add a check to say that this is only considered a failure if there also exists submission files for that model?

annakrystalli commented 7 months ago

OK no problem.

It'll require additional GitHub API calls and processing to determine whether model output files exist for a given metadata file which I'm not 100% sure is worth the effort. Will just throw an error if a metadata file is deleted on this first pass and we can decide at some point whether we want the extra check for associated model output files. Unless you feel strongly this should be included straight away.