Closed kjsato closed 3 months ago
Thanks @kjsato for reporting!
Is the hub online somewhere? It would be instructive to also have access to the config file as I suspect your feeling is correct (currently null
values in both required
and optional
are not supported and would likely trigger an error when validating the hub's config.
Also, what version of hubValidations
are you using? As I believe pad_missing_cols
was being used briefly in an older version and upgrading hubValidations
might solve the problem?
@annakrystalli Thanks for your early comments. Yes, the hub has already been up, so as a first workaround, I'll remove this corresponding part. The attached tgz which contains a config set can be used to reproduce the issue.
As I noted, version is the recent 0.0.0.9005. If I could access the former revisions, I would like to try one of them because I might not see such an issue(I would appreciate it if you could teach me if there is a way cause I failed to use install_github w/ specifying a version "@0.0.0.9004").
Thanks @kjsato . The latest version oh hubValidations is Version: 0.0.0.9008. If you use remotes::install_github("Infectious-Disease-Modeling-Hubs/hubValidations")
you will get the latest version. Please try and let me know how it goes.
Thanks @kjsato for reporting!
Is the hub online somewhere? It would be instructive to also have access to the config file as I suspect your feeling is correct (currently
null
values in bothrequired
andoptional
are not supported and would likely trigger an error when validating the hub's config.
The strange thing is that setting any of these values to anything other than null still caused an error and I could not find a workaround. There is a possibility that another issue is hidden (details are still unknown. Sorry)
Thanks @kjsato . The latest version oh hubValidations is Version: 0.0.0.9008. If you use
remotes::install_github("Infectious-Disease-Modeling-Hubs/hubValidations")
you will get the latest version. Please try and let me know how it goes.
okay thanks, I will try it
Thanks @kjsato . The latest version oh hubValidations is Version: 0.0.0.9008. If you use
remotes::install_github("Infectious-Disease-Modeling-Hubs/hubValidations")
you will get the latest version. Please try and let me know how it goes.okay thanks, I will try it
Sorry, no change (same result as:)
✖ 2023-11-26-teamsam-modelple.parquet: EXEC ERROR: Error in purrr::map(x, ~pad_missing_cols(.x, all_cols)) : ℹ In
index: 4. Caused by error in `value[[jvseq[[jjj]]]]`: ! subscript out of bounds
w/ hubValidations version 0.0.0.9008
Hi Koji. OK I looked into it and the problem is arising in the now hubData
function expand_model_out_val_grid
which produces a grid of valid value combinations. Currently we are not allowing both required
and optional
properties to be null
in the tasks.json
config file and seems you have that issue in three places in yours, twice when specifying horizon
and also once when specifying cdf
output types at the bottom. When I add a value in the optional properties of all three properties, the validation proceeds as expected:
validate_submission(hub_path=".",file_path="teamsam-modelple/2023-11-26-teamsam-modelple.parquet")
✔ sample: All hub config files are valid.
✔ 2023-11-26-teamsam-modelple.parquet: File exists at path model-output/teamsam-modelple/2023-11-26-teamsam-modelple.parquet.
✔ 2023-11-26-teamsam-modelple.parquet: File name "2023-11-26-teamsam-modelple.parquet" is valid.
✔ 2023-11-26-teamsam-modelple.parquet: File directory name matches `model_id` metadata in file name.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id` is valid.
✔ 2023-11-26-teamsam-modelple.parquet: File is accepted hub format.
✔ 2023-11-26-teamsam-modelple.parquet: Metadata file exists at path model-metadata/teamsam-modelple.yaml.
✔ 2023-11-26-teamsam-modelple.parquet: File could be read successfully.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id_col` name is valid.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id` column "origin_date" contains a single, unique round ID value.
✔ 2023-11-26-teamsam-modelple.parquet: All `round_id_col` "origin_date" values match submission `round_id` from file name.
✔ 2023-11-26-teamsam-modelple.parquet: Column names are consistent with expected round task IDs and std column names.
✔ 2023-11-26-teamsam-modelple.parquet: Column data types match hub schema.
✔ 2023-11-26-teamsam-modelple.parquet: `tbl` contains valid values/value combinations.
✔ 2023-11-26-teamsam-modelple.parquet: All combinations of task ID column/`output_type`/`output_type_id` values are unique.
✔ 2023-11-26-teamsam-modelple.parquet: Required task ID/output type/output type ID combinations all present.
✔ 2023-11-26-teamsam-modelple.parquet: Values in column `value` all valid with respect to modeling task config.
✔ 2023-11-26-teamsam-modelple.parquet: Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID value/output type
combinations of quantile or cdf output types.
ℹ 2023-11-26-teamsam-modelple.parquet: No pmf output types to check for sum of 1. Check skipped.
✔ 2023-11-26-teamsam-modelple.parquet: Submission time is within accepted submission window for round.
There has been some discussion about the potential for supporting null
in both required
and optional
(see https://github.com/Infectious-Disease-Modeling-Hubs/hubAdmin/issues/4) but so far it has not been agreed on.
Having said that what worries me is that:
tasks.json
file was validated as correct when it actually should not have been so you should have been notfied that this would be a problem much earlierIf supporting null
values is something that needs to be considered I proposed that we have that discussion as a group as it will influence whether I implement the missing checks on the tasks.json
files.
@shauntruelove @LucieContamin any thoughts on the above?
For completeness, I'm attaching the tasks.json
that is causing the issues:
{
"schema_version": "https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v2.0.0/tasks-schema.json",
"rounds": [
{
"round_id_from_variable": true,
"round_id": "origin_date",
"model_tasks": [
{
"task_ids": {
"origin_date": {
"required": null,
"optional": [
"2023-11-12", "2023-11-19", "2023-11-26", "2023-12-03",
"2023-12-10", "2023-12-17", "2023-12-24", "2023-12-31",
"2024-01-07", "2024-01-14", "2024-01-21", "2024-01-28",
"2024-02-05", "2024-02-11", "2024-02-18", "2024-02-25",
"2024-03-03", "2024-03-10", "2024-03-17", "2024-03-24",
"2024-03-31", "2024-04-07", "2024-04-14", "2024-04-21",
"2024-04-28", "2024-05-05", "2024-05-12"
]
},
"target": {
"required": ["inc hosp"],
"optional": null
},
"horizon": {
"required": [1, 2, 3, 4],
"optional": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
},
"location": {
"required": null,
"optional": [
"US",
"01",
"02",
"04",
"05",
"06",
"08",
"09",
"10",
"11",
"12",
"13",
"15",
"16",
"17",
"18",
"19",
"20",
"21",
"22",
"23",
"24",
"25",
"26",
"27",
"28",
"29",
"30",
"31",
"32",
"33",
"34",
"35",
"36",
"37",
"38",
"39",
"40",
"41",
"42",
"44",
"45",
"46",
"47",
"48",
"49",
"50",
"51",
"53",
"54",
"55",
"56",
"72",
"78"
]
},
"age_group":{
"required":["0-130"],
"optional":["0-0.99","1-4","5-17","5-64","18-49","50-64","65-130"]
}
},
"output_type": {
"quantile":{
"output_type_id":{
"required": [
0.01,
0.025,
0.05,
0.1,
0.15,
0.2,
0.25,
0.3,
0.35,
0.4,
0.45,
0.5,
0.55,
0.6,
0.65,
0.7,
0.75,
0.8,
0.85,
0.9,
0.95,
0.975,
0.99
],
"optional":null
},
"value":{
"type":"double",
"minimum":0
}
}
},
"target_metadata": [
{
"target_id": "inc hosp",
"target_name": "Weekly incident RSV hospitalizations",
"target_units": "count",
"target_keys": {
"target": ["inc hosp"]
},
"target_type": "continuous",
"is_step_ahead": true,
"time_unit": "week"
}
]
},
{
"task_ids": {
"origin_date": {
"required": null,
"optional": [
"2023-11-12", "2023-11-19", "2023-11-26", "2023-12-03",
"2023-12-10", "2023-12-17", "2023-12-24", "2023-12-31",
"2024-01-07", "2024-01-14", "2024-01-21", "2024-01-28",
"2024-02-05", "2024-02-11", "2024-02-18", "2024-02-25",
"2024-03-03", "2024-03-10", "2024-03-17", "2024-03-24",
"2024-03-31", "2024-04-07", "2024-04-14", "2024-04-21",
"2024-04-28", "2024-05-05", "2024-05-12"
]
},
"target": {
"required": null,
"optional": ["inc hosp", "cum hosp"]
},
"horizon": {
"required": [1, 2, 3, 4],
"optional": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
},
"location": {
"required": null,
"optional": [
"US",
"01",
"02",
"04",
"05",
"06",
"08",
"09",
"10",
"11",
"12",
"13",
"15",
"16",
"17",
"18",
"19",
"20",
"21",
"22",
"23",
"24",
"25",
"26",
"27",
"28",
"29",
"30",
"31",
"32",
"33",
"34",
"35",
"36",
"37",
"38",
"39",
"40",
"41",
"42",
"44",
"45",
"46",
"47",
"48",
"49",
"50",
"51",
"53",
"54",
"55",
"56",
"72",
"78"
]
},
"age_group":{
"required":["0-130"],
"optional":["0-0.99","1-4","5-17","5-64","18-49","50-64","65-130"]
}
},
"output_type": {
"sample":{
"output_type_id":{
"required": null,
"optional":[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
},
"value":{
"type":"double",
"minimum":0
}
}
},
"target_metadata": [
{
"target_id": "inc hosp",
"target_name": "Weekly incident RSV hospitalizations",
"target_units": "count",
"target_keys": {
"target": ["inc hosp"]
},
"target_type": "discrete",
"is_step_ahead": true,
"time_unit": "week"
},
{
"target_id": "cum hosp",
"target_name":"Weekly incident cumulative RSV hospitalizations",
"target_units":"count",
"target_keys":{
"target":["cum hosp"]
},
"target_type": "discrete",
"is_step_ahead": true,
"time_unit": "week"
}
]
},
{
"task_ids": {
"origin_date": {
"required": null,
"optional": [
"2023-11-12", "2023-11-19", "2023-11-26", "2023-12-03",
"2023-12-10", "2023-12-17", "2023-12-24", "2023-12-31",
"2024-01-07", "2024-01-14", "2024-01-21", "2024-01-28",
"2024-02-05", "2024-02-11", "2024-02-18", "2024-02-25",
"2024-03-03", "2024-03-10", "2024-03-17", "2024-03-24",
"2024-03-31", "2024-04-07", "2024-04-14", "2024-04-21",
"2024-04-28", "2024-05-05", "2024-05-12"
]
},
"target": {
"required": null,
"optional": ["peak size hosp"]
},
"horizon": {
"required": null,
"optional": null
},
"location": {
"required": null,
"optional": [
"US",
"01",
"02",
"04",
"05",
"06",
"08",
"09",
"10",
"11",
"12",
"13",
"15",
"16",
"17",
"18",
"19",
"20",
"21",
"22",
"23",
"24",
"25",
"26",
"27",
"28",
"29",
"30",
"31",
"32",
"33",
"34",
"35",
"36",
"37",
"38",
"39",
"40",
"41",
"42",
"44",
"45",
"46",
"47",
"48",
"49",
"50",
"51",
"53",
"54",
"55",
"56",
"72",
"78"
]
},
"age_group":{
"required":["0-130"],
"optional":["0-0.99","1-4","5-17","5-64","18-49","50-64","65-130"]
}
},
"output_type": {
"quantile":{
"output_type_id":{
"required":[0.01,0.025,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.975,0.99],
"optional":[0,1]
},
"value":{
"type":"double",
"minimum":0
}
}
},
"target_metadata": [
{
"target_id": "peak size hosp",
"target_name": "Peak size of hospitalization",
"target_units": "count",
"target_keys": {
"target": ["peak size hosp"]
},
"target_type": "discrete",
"is_step_ahead": false
}
]
},
{
"task_ids": {
"origin_date": {
"required": null,
"optional": [
"2023-11-12", "2023-11-19", "2023-11-26", "2023-12-03",
"2023-12-10", "2023-12-17", "2023-12-24", "2023-12-31",
"2024-01-07", "2024-01-14", "2024-01-21", "2024-01-28",
"2024-02-05", "2024-02-11", "2024-02-18", "2024-02-25",
"2024-03-03", "2024-03-10", "2024-03-17", "2024-03-24",
"2024-03-31", "2024-04-07", "2024-04-14", "2024-04-21",
"2024-04-28", "2024-05-05", "2024-05-12"
]
},
"target": {
"required": null,
"optional": ["peak time hosp"]
},
"horizon": {
"required": null,
"optional": null
},
"location": {
"required": null,
"optional": [
"US",
"01",
"02",
"04",
"05",
"06",
"08",
"09",
"10",
"11",
"12",
"13",
"15",
"16",
"17",
"18",
"19",
"20",
"21",
"22",
"23",
"24",
"25",
"26",
"27",
"28",
"29",
"30",
"31",
"32",
"33",
"34",
"35",
"36",
"37",
"38",
"39",
"40",
"41",
"42",
"44",
"45",
"46",
"47",
"48",
"49",
"50",
"51",
"53",
"54",
"55",
"56",
"72",
"78"
]
},
"age_group":{
"required":["0-130"],
"optional":["0-0.99","1-4","5-17","5-64","18-49","50-64","65-130"]
}
},
"output_type": {
"cdf":{
"output_type_id":{
"required":null,
"optional":null
},
"value":{
"type":"double",
"minimum":0,
"maximum":1
}
}
},
"target_metadata": [
{
"target_id": "peak time hosp",
"target_name": "Peak timing of hospitalization",
"target_units": "population",
"target_keys": {
"target": ["peak time hosp"]
},
"target_type": "discrete",
"is_step_ahead": true,
"time_unit": "week"
}
]
}
],
"submissions_due": {
"relative_to": "origin_date",
"start": -6,
"end": 100
}
}
]
}
Thanks @annakrystalli It is very helpful!
Hi Koji. OK I looked into it and the problem is arising in the now
hubData
functionexpand_model_out_val_grid
which produces a grid of valid value combinations. Currently we are not allowing bothrequired
andoptional
properties to benull
in thetasks.json
config file and seems you have that issue in three places in yours, twice when specifyinghorizon
and also once when specifyingcdf
output types at the bottom. When I add a value in the optional properties of all three properties, the validation proceeds as expected:validate_submission(hub_path=".",file_path="teamsam-modelple/2023-11-26-teamsam-modelple.parquet") ✔ sample: All hub config files are valid. ✔ 2023-11-26-teamsam-modelple.parquet: File exists at path model-output/teamsam-modelple/2023-11-26-teamsam-modelple.parquet. ✔ 2023-11-26-teamsam-modelple.parquet: File name "2023-11-26-teamsam-modelple.parquet" is valid. ✔ 2023-11-26-teamsam-modelple.parquet: File directory name matches `model_id` metadata in file name. ✔ 2023-11-26-teamsam-modelple.parquet: `round_id` is valid. ✔ 2023-11-26-teamsam-modelple.parquet: File is accepted hub format. ✔ 2023-11-26-teamsam-modelple.parquet: Metadata file exists at path model-metadata/teamsam-modelple.yaml. ✔ 2023-11-26-teamsam-modelple.parquet: File could be read successfully. ✔ 2023-11-26-teamsam-modelple.parquet: `round_id_col` name is valid. ✔ 2023-11-26-teamsam-modelple.parquet: `round_id` column "origin_date" contains a single, unique round ID value. ✔ 2023-11-26-teamsam-modelple.parquet: All `round_id_col` "origin_date" values match submission `round_id` from file name. ✔ 2023-11-26-teamsam-modelple.parquet: Column names are consistent with expected round task IDs and std column names. ✔ 2023-11-26-teamsam-modelple.parquet: Column data types match hub schema. ✔ 2023-11-26-teamsam-modelple.parquet: `tbl` contains valid values/value combinations. ✔ 2023-11-26-teamsam-modelple.parquet: All combinations of task ID column/`output_type`/`output_type_id` values are unique. ✔ 2023-11-26-teamsam-modelple.parquet: Required task ID/output type/output type ID combinations all present. ✔ 2023-11-26-teamsam-modelple.parquet: Values in column `value` all valid with respect to modeling task config. ✔ 2023-11-26-teamsam-modelple.parquet: Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID value/output type combinations of quantile or cdf output types. ℹ 2023-11-26-teamsam-modelple.parquet: No pmf output types to check for sum of 1. Check skipped. ✔ 2023-11-26-teamsam-modelple.parquet: Submission time is within accepted submission window for round.
Hi @kjsato , I'm posting a link to comment in a related issue: https://github.com/Infectious-Disease-Modeling-Hubs/hubAdmin/issues/4#issuecomment-1973207096
Probably best to follow conversations there but for a quick fix, you could try using ["NA"]
in e.g. the optional
property of horizon
instead of null
. I think you would definitely need some sort of value in the cdf
output_type_id
specification or to remove the particular output type for the modeling task where it's not required all together.
Note the above assumes that if someone submits forecasts for the modeling task that does not require a horizon
value that the value in the horizon
column of the file be NA
in the relevant rows.
Note as well this hasn't been fully tested yet so until it's agreed on as something we're supporting and tested there may be unexpected behaviour. PLease feel free to report any such behaviour if you encounter it 👍
Hi @annakrystalli , thanks for your advice. I'll follow them
@annakrystalli sorry for being late to inform that the update was okay on testing on horizon, thanks
Thank for letting me know @kjsato !!
when invoking validate_submission(), this error occurred (or so it seems) because a rule for another model_task(task_ids) reacts even though the target included in the data to be submitted were different.
version: hubValidations 0.0.0.9005
R: 4.3.2 (w/ RStudio 2023.12.1+402 (2023.12.1+402))
Reproducibility: Yes
memo
sample.tgz