hubverse-org / hubValidations

Testing framework for hubverse hub validations
https://hubverse-org.github.io/hubValidations/
Other
1 stars 3 forks source link

inconsistent validation results for high-precision floating point `output_type_id`s #58

Closed elray1 closed 7 months ago

elray1 commented 8 months ago

I attach (at bottom) two example submission files and two example tasks.json config files adapted from the set up in FluSight. In both files, a quantile level of 0.1 is represented in the file as 0.1000000000000000055511, which is the floating point representation of that value (at least, on my machine). The difference is whether the hub additionally includes another target with pmf output_type, where the output_type_id is a character string.

When I run validations for these output files, I get the following results:

Using the tasks.json file that includes a character output_type:

For the submission file that includes the pmf outputs, I see the following output:

> hubValidations::validate_submission(
+     hub_path=".",
+     file_path="UMass-gbq_bootstrap/2023-10-28-UMass-gbq_bootstrap.csv")

...

✖ 2023-10-28-UMass-gbq_bootstrap.csv: `tbl` contains invalid values/value combinations.  Column
  `output_type_id` contains invalid value "0.1000000000000000055511".

For the submission file that does not include the pmf outputs, I get a similar result:

> hubValidations::validate_submission(
+     hub_path=".",
+     file_path="UMass-gbq_bootstrap/2023-11-04-UMass-gbq_bootstrap.csv")

...

✖ 2023-11-04-UMass-gbq_bootstrap.csv: `tbl` contains invalid values/value combinations.  Column
  `output_type_id` contains invalid value "0.1000000000000000055511".

Using the tasks.json file that does not include a character output type

However, for the submission file that does not include the pmf outputs, I get the following output with no errors related to this check:

+     file_path="UMass-gbq_bootstrap/2023-11-04-UMass-gbq_bootstrap.csv")
✔ FluSight-forecast-hub: All hub config files are valid.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: File exists at path
  model-output/UMass-gbq_bootstrap/2023-11-04-UMass-gbq_bootstrap.csv.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: File name "2023-11-04-UMass-gbq_bootstrap.csv" is valid.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: File directory name matches `model_id` metadata in file name.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: `round_id` is valid.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: File is accepted hub format.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: Metadata file exists at path model-metadata/UMass-gbq_bootstrap.yml.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: File could be read successfully.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: `round_id_col` name is valid.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: `round_id` column "reference_date" contains a single, unique round ID
  value.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: All `round_id_col` "reference_date" values match submission `round_id`
  from file name.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: Column names are consistent with expected round task IDs and std
  column names.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: Column data types match hub schema.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: `tbl` contains valid values/value combinations.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: All combinations of task ID column/`output_type`/`output_type_id`
  values are unique.
! 2023-11-04-UMass-gbq_bootstrap.csv: Required task ID/output type/output type ID combinations missing.  See
  `missing` attribute for details.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: Values in column `value` all valid with respect to modeling task
  config.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: Values in `value` column are non-decreasing as output_type_ids
  increase for all unique task ID value/output type combinations of quantile or cdf output types.
ℹ 2023-11-04-UMass-gbq_bootstrap.csv: No pmf output types to check for sum of 1. Check skipped.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: Time differences between t0 var `reference_date` and t1 var
  `target_end_date` all match expected period of 7d 0H 0M 0S * `horizon`.
✔ 2023-11-04-UMass-gbq_bootstrap.csv: Target counts are less than location population size.
! 2023-11-04-UMass-gbq_bootstrap.csv: Submission time must be within accepted submission window for round.
  Current time 2023-10-21 18:15:30 is outside window 2023-10-29 EDT--2023-11-01 23:59:59 EDT.

Summary

Ideally, whether or not this submission file was valid would not depend on configuration settings for other targets.

tasks_with_char_output_type.json tasks_without_char_output_type.json 2023-10-28-UMass-gbq_bootstrap.csv 2023-11-04-UMass-gbq_bootstrap.csv