Closed annakrystalli closed 1 year ago
For the question:
For each combination of task_id values , the projected values are not identical for the complete time series (?). (Original check description: For each group of location/target/scenario (and age_group if necessary) , the projected values are not identical for the complete time serie. @LucieContamin not sure what this check means, could you help me understand?)
What we are testing here is: if for a specific combination of task_id, the column value
has the same projected value for the complete time serie. For example, if in a specific round, for a location, scenario A, incident death, sample 1, the projected value is the same for the complete projected time serie (all horizon):
origin_date | scenario_id | location | target | horizon | output_type | output_type_id | value |
---|---|---|---|---|---|---|---|
2023-06-30 | A | 03 | inc death | 1 | sample | 1 | 0 |
2023-06-30 | A | 03 | inc death | 2 | sample | 1 | 0 |
2023-06-30 | A | 03 | inc death | 3 | sample | 1 | 0 |
2023-06-30 | A | 03 | inc death | ... | sample | 1 | 0 |
2023-06-30 | A | 03 | inc death | 104 | sample | 1 | 0 |
It might be interesting to only apply it for long-term projections. Also, for the US Scenario Modeling Hub, it only returns a warning instead of an error because it can happen but, we want to be sure it's what the team are expecting as a result in their projections and not an error.
Overview
This function performs all checks on a model output file's contents. The list has been compiled initially from the Hub Validations list excel spreadsheet.
Some checks (e.g. of
output_type_id
andvalue
properties) will need to be performed on splits of the data: Splits will be dictated by:Also, differences in checks will likely arise according to whether a hub is configured as
round_id_from_variable: true
or not (i.e. whether round specific configurations apply or not) (See #6 for advise on how to determine submission round IDs).General checks
Task id combinations
values
are not identical for the complete time series (?). (Original check description: _For each group of location/target/scenario (and agegroup if necessary) , the projectedvalues
are not identical for the complete time serie. @LucieContamin not sure what this check means, could you help me understand?)Date checks
forecast_date
andtarget_end_date
are both present, validate that dates are correct in relation to each other. (i.e. alltarget_end_dates
are valid with respect toforecast_date
andhorizon
)forecast_date
(or equivalent field) lies within specified range of a specified date (e.g. the date on which the submission was made)forecast_date
column are the same and match the date in the file nameScenario Hub checks:
Output Type specific checks:
_Many of these checks will likely already be defined in
tasks.json
and should be able to automatically be composed from that. e.g. see https://github.com/Infectious-Disease-Modeling-Hubs/hubUtils/blob/main/R/check_input.R_mean
/median
output_type_id
values areNA
value
s match (or can be cast to) any data type specified intasks.json
quantile
output_type_id
values range from 0-1cdf
value
values range from 0-1.output_type_id
values are unique (e.g. no duplicateoutput_type_id
values).cdf
/pmf
value
values range from 0-1 andvalue
values must sum to 1 (unless binary?).output_type_id
values are unique (e.g. no duplicateoutput_type_id
values)value
must be non-decreasing asoutput_type_id
increase.Target specific checks
Cumulative count target types:
value
for the "cumulative count" is equal or higher than the observed cumulative death count for the previous week (week 0) or previous past week (week - 1) (depending on availability) before projection starting date.value
for the "cumulative count" are not decreasing with timeCounts in general
value
is less than the location's population size