hubverse-org / hubValidations

Testing framework for hubverse hub validations
https://hubverse-org.github.io/hubValidations/
Other
1 stars 3 forks source link

Detect and throw error when model output or model metadata files are deleted/modified. #63

Closed annakrystalli closed 7 months ago

annakrystalli commented 7 months ago

PR adds functionality tovalidate_pr() to check for deletions of previously submitted model metadata files and modifications or deletions of previously submitted model output files, adding an <error/check_error> class object to the function output for each detected modified/deleted file

Testing uses the following PR: https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple/pull/6

annakrystalli commented 7 months ago

Demo of functionality including modification & deletion of model output files + deletion of model metadata file in a PR

library(hubValidations)
temp_hub <- fs::path(tempdir(), "mod_del_hub")
gert::git_clone(
    url = "https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
    path = temp_hub,
    branch = "test-mod-del"
)

validate_pr(
    hub_path = temp_hub,
    gh_repo = "Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
    pr_number = 6,
    skip_submit_window_check = TRUE
)
#> ℹ PR contains commits to additional files which have not been checked:
#> • ".github/workflows/validate_submission.yaml"
#> • "README.md"
#> • "model-metadata/README.md"
#> • "model-output/hub-baseline/README.txt"
#> • "random-file.txt"
#> ✔ mod_del_hub: All hub config files are valid.
#> ✖ 2022-10-08-hub-baseline.csv: Previously submitted model output files must not
#>   be modified.  'model-output/hub-baseline/2022-10-08-hub-baseline.csv'
#>   modified.
#> ✖ 2022-10-15-team1-goodmodel.csv: Previously submitted model output files must
#>   not be removed.
#>   'model-output/team1-goodmodel/2022-10-15-team1-goodmodel.csv' removed.
#> ✖ team1-goodmodel.yaml: Previously submitted model metadata files must not be
#>   removed.  'model-metadata/team1-goodmodel.yaml' removed.
#> ✔ 2022-10-08-hub-baseline.csv: File exists at path
#>   'model-output/hub-baseline/2022-10-08-hub-baseline.csv'.
#> ✔ 2022-10-08-hub-baseline.csv: File name "2022-10-08-hub-baseline.csv" is
#>   valid.
#> ✔ 2022-10-08-hub-baseline.csv: File directory name matches `model_id` metadata
#>   in file name.
#> ✔ 2022-10-08-hub-baseline.csv: `round_id` is valid.
#> ✔ 2022-10-08-hub-baseline.csv: File is accepted hub format.
#> ✔ 2022-10-08-hub-baseline.csv: Metadata file exists at path
#>   'model-metadata/hub-baseline.yml'.
#> ✔ 2022-10-08-hub-baseline.csv: File could be read successfully.
#> ✔ 2022-10-08-hub-baseline.csv: `round_id_col` name is valid.
#> ✔ 2022-10-08-hub-baseline.csv: `round_id` column "origin_date" contains a
#>   single, unique round ID value.
#> ✔ 2022-10-08-hub-baseline.csv: All `round_id_col` "origin_date" values match
#>   submission `round_id` from file name.
#> ✔ 2022-10-08-hub-baseline.csv: Column names are consistent with expected round
#>   task IDs and std column names.
#> ✔ 2022-10-08-hub-baseline.csv: Column data types match hub schema.
#> ✔ 2022-10-08-hub-baseline.csv: `tbl` contains valid values/value combinations.
#> ✔ 2022-10-08-hub-baseline.csv: All combinations of task ID
#>   column/`output_type`/`output_type_id` values are unique.
#> ✔ 2022-10-08-hub-baseline.csv: Required task ID/output type/output type ID
#>   combinations all present.
#> ✔ 2022-10-08-hub-baseline.csv: Values in column `value` all valid with respect
#>   to modeling task config.
#> ✔ 2022-10-08-hub-baseline.csv: Values in `value` column are non-decreasing as
#>   output_type_ids increase for all unique task ID value/output type
#>   combinations of quantile or cdf output types.
#> ℹ 2022-10-08-hub-baseline.csv: No pmf output types to check for sum of 1. Check
#>   skipped.
#> ✔ 2022-10-22-team1-goodmodel.csv: File exists at path
#>   'model-output/team1-goodmodel/2022-10-22-team1-goodmodel.csv'.
#> ✔ 2022-10-22-team1-goodmodel.csv: File name "2022-10-22-team1-goodmodel.csv" is
#>   valid.
#> ✔ 2022-10-22-team1-goodmodel.csv: File directory name matches `model_id`
#>   metadata in file name.
#> ✔ 2022-10-22-team1-goodmodel.csv: `round_id` is valid.
#> ✔ 2022-10-22-team1-goodmodel.csv: File is accepted hub format.
#> ✖ 2022-10-22-team1-goodmodel.csv: Metadata file does not exist at path
#>   'model-metadata/team1-goodmodel.yml' or
#>   'model-metadata/team1-goodmodel.yaml'.

Created on 2023-11-28 with reprex v2.0.2

annakrystalli commented 7 months ago

After running the code locally, I have some questions. Some of the tests are falling locally might be because of a version of the rlang package maybe:

Indeed it is to do with the version of rlang. I believe if you update your rlang version that should go away.

annakrystalli commented 7 months ago

Actually it's not the rlang version that is the issue. I think it might be a testthat version issue.

See for example this commit I had to make to update broken tests due to new testthat version. https://github.com/Infectious-Disease-Modeling-Hubs/hubUtils/commit/8be9dbcd80a791d6f84ec1f6005e74e94ccd0269

LucieContamin commented 7 months ago

Actually it's not the rlang version that is the issue. I think it might be a testthat version issue.

See for example this commit I had to make to update broken tests due to new testthat version. Infectious-Disease-Modeling-Hubs/hubUtils@8be9dbc

Thanks, I miss the version information on the DESCRIPTION file, sorry about that! Updating testthat fixed the issue! THanks again

annakrystalli commented 7 months ago

Yeyyy 🚀

Thanks again for your review @LucieContamin !