Create `validate_model_output_data()` function

Overview

This function performs all checks on a model output file's contents. The list has been compiled initially from the Hub Validations list excel spreadsheet.

Some checks (e.g. of output_type_id and value properties) will need to be performed on splits of the data: Splits will be dictated by:

Round specific model task array items.
The requirements of different output types.

Also, differences in checks will likely arise according to whether a hub is configured as round_id_from_variable: true or not (i.e. whether round specific configurations apply or not) (See #6 for advise on how to determine submission round IDs).

General checks

[x] Correct column names (i.e. only and all required task ids for a given round are included).
[x] Values in each task id column match those specified in tasks.json.
- [x] enum properties match expected data type and accepted values
- [x] numeric values conform to specified data types and ranges
- [x] dates conform to ISO format.

Task id combinations

[ ] A value for each combinations of all required task id values for the round is present. #10
[x] Only a single row for each combination of task id values is present. #10
[ ] For each combination of task_id values , the projected values are not identical for the complete time series (?). (Original check description: _For each group of location/target/scenario (and agegroup if necessary) , the projected values are not identical for the complete time serie. @LucieContamin not sure what this check means, could you help me understand?)

Date checks

[ ] If forecast_date and target_end_date are both present, validate that dates are correct in relation to each other. (i.e. all target_end_dates are valid with respect to forecast_date and horizon)
[ ] the forecast_date (or equivalent field) lies within specified range of a specified date (e.g. the date on which the submission was made)
[x] All dates in a forecast_date column are the same and match the date in the file name

Scenario Hub checks:

[ ] For Scenario Hubs, the scenario identifiers corresponds to the expected value. Implemented in the SMH validation package with multiple 200 error messages (verify name, id and correct association between both)

Output Type specific checks:

_Many of these checks will likely already be defined in tasks.json and should be able to automatically be composed from that. e.g. see https://github.com/Infectious-Disease-Modeling-Hubs/hubUtils/blob/main/R/check_input.R_

`mean` / `median`

[x] all output_type_id values are NA
[x] values match (or can be cast to) any data type specified in tasks.json

`quantile`

[x] output_type_id values range from 0-1
[x] for each individual prediction in quantile format, quantiles are unique (e.g. no duplicate quantile values)
[x] for each individual prediction in quantile format, entries in value must be non-decreasing as quantiles increase.

`cdf`

[x] value values range from 0-1.
[x] for each individual prediction, output_type_id values are unique (e.g. no duplicate output_type_id values).

`cdf` / `pmf`

[x] value values range from 0-1 and value values must sum to 1 (unless binary?).
[x] for each individual prediction, output_type_id values are unique (e.g. no duplicate output_type_id values)
[x] for each individual prediction, entries in value must be non-decreasing as output_type_id increase.

Target specific checks

Cumulative count target types:

[ ] The projected value for the "cumulative count" is equal or higher than the observed cumulative death count for the previous week (week 0) or previous past week (week - 1) (depending on availability) before projection starting date.
[ ] The projected value for the "cumulative count" are not decreasing with time

Counts in general

[ ] value is less than the location's population size

For the question:

For each combination of task_id values , the projected values are not identical for the complete time series (?). (Original check description: For each group of location/target/scenario (and age_group if necessary) , the projected values are not identical for the complete time serie. @LucieContamin not sure what this check means, could you help me understand?)

What we are testing here is: if for a specific combination of task_id, the column value has the same projected value for the complete time serie. For example, if in a specific round, for a location, scenario A, incident death, sample 1, the projected value is the same for the complete projected time serie (all horizon):

origin_date	scenario_id	location	target	horizon	output_type	output_type_id
2023-06-30	A	03	inc death	1	sample	1
2023-06-30	A	03	inc death	2	sample	1
2023-06-30	A	03	inc death	3	sample	1
2023-06-30	A	03	inc death	...	sample	1
2023-06-30	A	03	inc death	104	sample	1

It might be interesting to only apply it for long-term projections. Also, for the US Scenario Modeling Hub, it only returns a warning instead of an error because it can happen but, we want to be sure it's what the team are expecting as a result in their projections and not an error.

hubverse-org / hubValidations