hubverse-org / hubValidations

Testing framework for hubverse hub validations
https://hubverse-org.github.io/hubValidations/
Other
1 stars 4 forks source link

Create `validate_model_output_data()` function #5

Closed annakrystalli closed 1 year ago

annakrystalli commented 1 year ago

Overview

This function performs all checks on a model output file's contents. The list has been compiled initially from the Hub Validations list excel spreadsheet.

Some checks (e.g. of output_type_id and value properties) will need to be performed on splits of the data: Splits will be dictated by:

Also, differences in checks will likely arise according to whether a hub is configured as round_id_from_variable: true or not (i.e. whether round specific configurations apply or not) (See #6 for advise on how to determine submission round IDs).

General checks

Task id combinations

Date checks

Scenario Hub checks:

Output Type specific checks:

_Many of these checks will likely already be defined in tasks.json and should be able to automatically be composed from that. e.g. see https://github.com/Infectious-Disease-Modeling-Hubs/hubUtils/blob/main/R/check_input.R_

mean / median

quantile

cdf

cdf / pmf

Target specific checks

Cumulative count target types:

Counts in general

LucieContamin commented 1 year ago

For the question:

For each combination of task_id values , the projected values are not identical for the complete time series (?). (Original check description: For each group of location/target/scenario (and age_group if necessary) , the projected values are not identical for the complete time serie. @LucieContamin not sure what this check means, could you help me understand?)

What we are testing here is: if for a specific combination of task_id, the column value has the same projected value for the complete time serie. For example, if in a specific round, for a location, scenario A, incident death, sample 1, the projected value is the same for the complete projected time serie (all horizon):

origin_date scenario_id location target horizon output_type output_type_id value
2023-06-30 A 03 inc death 1 sample 1 0
2023-06-30 A 03 inc death 2 sample 1 0
2023-06-30 A 03 inc death 3 sample 1 0
2023-06-30 A 03 inc death ... sample 1 0
2023-06-30 A 03 inc death 104 sample 1 0

It might be interesting to only apply it for long-term projections. Also, for the US Scenario Modeling Hub, it only returns a warning instead of an error because it can happen but, we want to be sure it's what the team are expecting as a result in their projections and not an error.