hubverse-org / hubValidations

Testing framework for hubverse hub validations
https://hubverse-org.github.io/hubValidations/
Other
1 stars 3 forks source link

EXEC ERROR: Error in purrr::map(x, ~pad_missing_cols(.x, all_cols)) occurred when invoking validate_submission() #75

Closed kjsato closed 3 months ago

kjsato commented 4 months ago

when invoking validate_submission(), this error occurred (or so it seems) because a rule for another model_task(task_ids) reacts even though the target included in the data to be submitted were different.


> validate_submission(hub_path=".",file_path="teamsam-modelple/2023-11-26-teamsam-modelple.parquet")
✔ rsv-forecast-hub-kjsato: All hub config files are valid.
✔ 2023-11-26-teamsam-modelple.parquet: File exists at path
  model-output/teamsam-modelple/2023-11-26-teamsam-modelple.parquet.
✔ 2023-11-26-teamsam-modelple.parquet: File name "2023-11-26-teamsam-modelple.parquet" is valid.
✔ 2023-11-26-teamsam-modelple.parquet: File directory name matches `model_id` metadata in file name.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id` is valid.
✔ 2023-11-26-teamsam-modelple.parquet: File is accepted hub format.
✔ 2023-11-26-teamsam-modelple.parquet: Metadata file exists at path model-metadata/teamsam-modelple.yaml.
✔ 2023-11-26-teamsam-modelple.parquet: File could be read successfully.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id_col` name is valid.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id` column "origin_date" contains a single, unique round ID value.
✔ 2023-11-26-teamsam-modelple.parquet: All `round_id_col` "origin_date" values match submission `round_id` from file
  name.
✔ 2023-11-26-teamsam-modelple.parquet: Column names are consistent with expected round task IDs and std column names.
✔ 2023-11-26-teamsam-modelple.parquet: Column data types match hub schema.
✖ 2023-11-26-teamsam-modelple.parquet: EXEC ERROR: Error in purrr::map(x, ~pad_missing_cols(.x, all_cols)) : ℹ In
  index: 4. Caused by error in `value[[jvseq[[jjj]]]]`: ! subscript out of bounds
✔ 2023-11-26-teamsam-modelple.parquet: Submission time is within accepted submission window for round.

sample.tgz

annakrystalli commented 4 months ago

Thanks @kjsato for reporting!

Is the hub online somewhere? It would be instructive to also have access to the config file as I suspect your feeling is correct (currently null values in both required and optional are not supported and would likely trigger an error when validating the hub's config.

annakrystalli commented 4 months ago

Also, what version of hubValidations are you using? As I believe pad_missing_cols was being used briefly in an older version and upgrading hubValidations might solve the problem?

kjsato commented 4 months ago

@annakrystalli Thanks for your early comments. Yes, the hub has already been up, so as a first workaround, I'll remove this corresponding part. The attached tgz which contains a config set can be used to reproduce the issue.

As I noted, version is the recent 0.0.0.9005. If I could access the former revisions, I would like to try one of them because I might not see such an issue(I would appreciate it if you could teach me if there is a way cause I failed to use install_github w/ specifying a version "@0.0.0.9004").

annakrystalli commented 4 months ago

Thanks @kjsato . The latest version oh hubValidations is Version: 0.0.0.9008. If you use remotes::install_github("Infectious-Disease-Modeling-Hubs/hubValidations") you will get the latest version. Please try and let me know how it goes.

kjsato commented 4 months ago

Thanks @kjsato for reporting!

Is the hub online somewhere? It would be instructive to also have access to the config file as I suspect your feeling is correct (currently null values in both required and optional are not supported and would likely trigger an error when validating the hub's config.

The strange thing is that setting any of these values to anything other than null still caused an error and I could not find a workaround. There is a possibility that another issue is hidden (details are still unknown. Sorry)

kjsato commented 4 months ago

Thanks @kjsato . The latest version oh hubValidations is Version: 0.0.0.9008. If you use remotes::install_github("Infectious-Disease-Modeling-Hubs/hubValidations") you will get the latest version. Please try and let me know how it goes.

okay thanks, I will try it

kjsato commented 4 months ago

Thanks @kjsato . The latest version oh hubValidations is Version: 0.0.0.9008. If you use remotes::install_github("Infectious-Disease-Modeling-Hubs/hubValidations") you will get the latest version. Please try and let me know how it goes.

okay thanks, I will try it

Sorry, no change (same result as:)

✖ 2023-11-26-teamsam-modelple.parquet: EXEC ERROR: Error in purrr::map(x, ~pad_missing_cols(.x, all_cols)) : ℹ In
  index: 4. Caused by error in `value[[jvseq[[jjj]]]]`: ! subscript out of bounds

w/ hubValidations version 0.0.0.9008

annakrystalli commented 4 months ago

Hi Koji. OK I looked into it and the problem is arising in the now hubData function expand_model_out_val_grid which produces a grid of valid value combinations. Currently we are not allowing both required and optional properties to be null in the tasks.json config file and seems you have that issue in three places in yours, twice when specifying horizon and also once when specifying cdf output types at the bottom. When I add a value in the optional properties of all three properties, the validation proceeds as expected:

validate_submission(hub_path=".",file_path="teamsam-modelple/2023-11-26-teamsam-modelple.parquet")
✔ sample: All hub config files are valid.
✔ 2023-11-26-teamsam-modelple.parquet: File exists at path model-output/teamsam-modelple/2023-11-26-teamsam-modelple.parquet.
✔ 2023-11-26-teamsam-modelple.parquet: File name "2023-11-26-teamsam-modelple.parquet" is valid.
✔ 2023-11-26-teamsam-modelple.parquet: File directory name matches `model_id` metadata in file name.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id` is valid.
✔ 2023-11-26-teamsam-modelple.parquet: File is accepted hub format.
✔ 2023-11-26-teamsam-modelple.parquet: Metadata file exists at path model-metadata/teamsam-modelple.yaml.
✔ 2023-11-26-teamsam-modelple.parquet: File could be read successfully.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id_col` name is valid.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id` column "origin_date" contains a single, unique round ID value.
✔ 2023-11-26-teamsam-modelple.parquet: All `round_id_col` "origin_date" values match submission `round_id` from file name.
✔ 2023-11-26-teamsam-modelple.parquet: Column names are consistent with expected round task IDs and std column names.
✔ 2023-11-26-teamsam-modelple.parquet: Column data types match hub schema.
✔ 2023-11-26-teamsam-modelple.parquet: `tbl` contains valid values/value combinations.
✔ 2023-11-26-teamsam-modelple.parquet: All combinations of task ID column/`output_type`/`output_type_id` values are unique.
✔ 2023-11-26-teamsam-modelple.parquet: Required task ID/output type/output type ID combinations all present.
✔ 2023-11-26-teamsam-modelple.parquet: Values in column `value` all valid with respect to modeling task config.
✔ 2023-11-26-teamsam-modelple.parquet: Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID value/output type
  combinations of quantile or cdf output types.
ℹ 2023-11-26-teamsam-modelple.parquet: No pmf output types to check for sum of 1. Check skipped.
✔ 2023-11-26-teamsam-modelple.parquet: Submission time is within accepted submission window for round.
annakrystalli commented 4 months ago

There has been some discussion about the potential for supporting null in both required and optional (see https://github.com/Infectious-Disease-Modeling-Hubs/hubAdmin/issues/4) but so far it has not been agreed on.

Having said that what worries me is that:

  1. the tasks.json file was validated as correct when it actually should not have been so you should have been notfied that this would be a problem much earlier
  2. that perhaps the function itself should check for this issue and give a more informative error.

If supporting null values is something that needs to be considered I proposed that we have that discussion as a group as it will influence whether I implement the missing checks on the tasks.json files.

@shauntruelove @LucieContamin any thoughts on the above?

annakrystalli commented 4 months ago

For completeness, I'm attaching the tasks.json that is causing the issues:

{
    "schema_version": "https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v2.0.0/tasks-schema.json",
    "rounds": [
    {
            "round_id_from_variable": true,
            "round_id": "origin_date",
            "model_tasks": [
            {
                "task_ids": {
                    "origin_date": {
                        "required": null,
                        "optional": [
                            "2023-11-12", "2023-11-19", "2023-11-26", "2023-12-03",
                            "2023-12-10", "2023-12-17", "2023-12-24", "2023-12-31",
                            "2024-01-07", "2024-01-14", "2024-01-21", "2024-01-28",
                            "2024-02-05", "2024-02-11", "2024-02-18", "2024-02-25",
                            "2024-03-03", "2024-03-10", "2024-03-17", "2024-03-24",
                            "2024-03-31", "2024-04-07", "2024-04-14", "2024-04-21",
                            "2024-04-28", "2024-05-05", "2024-05-12"
                            ]
                    },
                    "target": {
                        "required": ["inc hosp"],
                        "optional": null
                    },
                    "horizon": {
                        "required": [1, 2, 3, 4],
                        "optional": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
                    },
                    "location": {
                        "required": null,
                        "optional": [
                                "US",
                                "01",
                                "02",
                                "04",
                                "05",
                                "06",
                                "08",
                                "09",
                                "10",
                                "11",
                                "12",
                                "13",
                                "15",
                                "16",
                                "17",
                                "18",
                                "19",
                                "20",
                                "21",
                                "22",
                                "23",
                                "24",
                                "25",
                                "26",
                                "27",
                                "28",
                                "29",
                                "30",
                                "31",
                                "32",
                                "33",
                                "34",
                                "35",
                                "36",
                                "37",
                                "38",
                                "39",
                                "40",
                                "41",
                                "42",
                                "44",
                                "45",
                                "46",
                                "47",
                                "48",
                                "49",
                                "50",
                                "51",
                                "53",
                                "54",
                                "55",
                                "56",
                                "72",
                                "78"
                            ]
                    },
                    "age_group":{
                        "required":["0-130"],
                        "optional":["0-0.99","1-4","5-17","5-64","18-49","50-64","65-130"]
                    }
                },
                "output_type": {
                    "quantile":{
                        "output_type_id":{
                            "required": [
                                0.01,
                                0.025,
                                0.05,
                                0.1,
                                0.15,
                                0.2,
                                0.25,
                                0.3,
                                0.35,
                                0.4,
                                0.45,
                                0.5,
                                0.55,
                                0.6,
                                0.65,
                                0.7,
                                0.75,
                                0.8,
                                0.85,
                                0.9,
                                0.95,
                                0.975,
                                0.99
                            ],
                            "optional":null
                        },
                        "value":{
                            "type":"double",
                            "minimum":0

                        }
                    }
                },
                "target_metadata": [
                    {
                       "target_id": "inc hosp",
                       "target_name": "Weekly incident RSV hospitalizations",
                       "target_units": "count",
                       "target_keys": {
                           "target": ["inc hosp"]
                       },
                       "target_type": "continuous",
                       "is_step_ahead": true,
                       "time_unit": "week"
                    }
                ]
            },
            {
                "task_ids": {
                 "origin_date": {
                     "required": null,
                     "optional": [
                         "2023-11-12", "2023-11-19", "2023-11-26", "2023-12-03",
                         "2023-12-10", "2023-12-17", "2023-12-24", "2023-12-31",
                         "2024-01-07", "2024-01-14", "2024-01-21", "2024-01-28",
                         "2024-02-05", "2024-02-11", "2024-02-18", "2024-02-25",
                         "2024-03-03", "2024-03-10", "2024-03-17", "2024-03-24",
                         "2024-03-31", "2024-04-07", "2024-04-14", "2024-04-21",
                         "2024-04-28", "2024-05-05", "2024-05-12"
                         ]
                 },
                 "target": {
                     "required": null,
                     "optional": ["inc hosp", "cum hosp"]
                 },
                 "horizon": {
                     "required": [1, 2, 3, 4],
                     "optional": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
                 },
                 "location": {
                     "required": null,
                     "optional": [
                             "US",
                             "01",
                             "02",
                             "04",
                             "05",
                             "06",
                             "08",
                             "09",
                             "10",
                             "11",
                             "12",
                             "13",
                             "15",
                             "16",
                             "17",
                             "18",
                             "19",
                             "20",
                             "21",
                             "22",
                             "23",
                             "24",
                             "25",
                             "26",
                             "27",
                             "28",
                             "29",
                             "30",
                             "31",
                             "32",
                             "33",
                             "34",
                             "35",
                             "36",
                             "37",
                             "38",
                             "39",
                             "40",
                             "41",
                             "42",
                             "44",
                             "45",
                             "46",
                             "47",
                             "48",
                             "49",
                             "50",
                             "51",
                             "53",
                             "54",
                             "55",
                             "56",
                             "72",
                             "78"
                         ]
                 },
                 "age_group":{
                     "required":["0-130"],
                     "optional":["0-0.99","1-4","5-17","5-64","18-49","50-64","65-130"]
                 }
                },
                "output_type": {
                    "sample":{
                        "output_type_id":{
                            "required": null,
                            "optional":[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
                        },
                        "value":{
                        "type":"double",
                        "minimum":0
                        }
                    }
                },
                "target_metadata": [
                 {
                    "target_id": "inc hosp",
                    "target_name": "Weekly incident RSV hospitalizations",
                    "target_units": "count",
                    "target_keys": {
                        "target": ["inc hosp"]
                    },
                    "target_type": "discrete",
                    "is_step_ahead": true,
                    "time_unit": "week"
                 },
                 {
                    "target_id": "cum hosp",
                    "target_name":"Weekly incident cumulative RSV hospitalizations",
                    "target_units":"count",
                    "target_keys":{
                        "target":["cum hosp"]
                    },
                    "target_type": "discrete",
                    "is_step_ahead": true,
                    "time_unit": "week"
                 }
                ]
            },
            {
                "task_ids": {
                 "origin_date": {
                     "required": null,
                     "optional": [
                         "2023-11-12", "2023-11-19", "2023-11-26", "2023-12-03",
                         "2023-12-10", "2023-12-17", "2023-12-24", "2023-12-31",
                         "2024-01-07", "2024-01-14", "2024-01-21", "2024-01-28",
                         "2024-02-05", "2024-02-11", "2024-02-18", "2024-02-25",
                         "2024-03-03", "2024-03-10", "2024-03-17", "2024-03-24",
                         "2024-03-31", "2024-04-07", "2024-04-14", "2024-04-21",
                         "2024-04-28", "2024-05-05", "2024-05-12"
                         ]
                 },
                 "target": {
                     "required": null,
                     "optional": ["peak size hosp"]
                 },
                 "horizon": {
                     "required": null,
                     "optional": null
                 },
                 "location": {
                     "required": null,
                     "optional": [
                             "US",
                             "01",
                             "02",
                             "04",
                             "05",
                             "06",
                             "08",
                             "09",
                             "10",
                             "11",
                             "12",
                             "13",
                             "15",
                             "16",
                             "17",
                             "18",
                             "19",
                             "20",
                             "21",
                             "22",
                             "23",
                             "24",
                             "25",
                             "26",
                             "27",
                             "28",
                             "29",
                             "30",
                             "31",
                             "32",
                             "33",
                             "34",
                             "35",
                             "36",
                             "37",
                             "38",
                             "39",
                             "40",
                             "41",
                             "42",
                             "44",
                             "45",
                             "46",
                             "47",
                             "48",
                             "49",
                             "50",
                             "51",
                             "53",
                             "54",
                             "55",
                             "56",
                             "72",
                             "78"
                         ]
                 },
                 "age_group":{
                     "required":["0-130"],
                     "optional":["0-0.99","1-4","5-17","5-64","18-49","50-64","65-130"]
                 }
                },
                "output_type": {
                    "quantile":{
                        "output_type_id":{
                            "required":[0.01,0.025,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,0.975,0.99],
                            "optional":[0,1]
                        },
                        "value":{
                            "type":"double",
                            "minimum":0
                        }
                    }
                },
                "target_metadata": [
                 {
                    "target_id": "peak size hosp",
                    "target_name": "Peak size of hospitalization",
                    "target_units": "count",
                    "target_keys": {
                        "target": ["peak size hosp"]
                    },
                    "target_type": "discrete",
                    "is_step_ahead": false
                 }
                ]
            },
            {
                "task_ids": {
                 "origin_date": {
                     "required": null,
                     "optional": [
                         "2023-11-12", "2023-11-19", "2023-11-26", "2023-12-03",
                         "2023-12-10", "2023-12-17", "2023-12-24", "2023-12-31",
                         "2024-01-07", "2024-01-14", "2024-01-21", "2024-01-28",
                         "2024-02-05", "2024-02-11", "2024-02-18", "2024-02-25",
                         "2024-03-03", "2024-03-10", "2024-03-17", "2024-03-24",
                         "2024-03-31", "2024-04-07", "2024-04-14", "2024-04-21",
                         "2024-04-28", "2024-05-05", "2024-05-12"
                         ]
                 },
                 "target": {
                     "required": null,
                     "optional": ["peak time hosp"]
                 },
                 "horizon": {
                     "required": null,
                     "optional": null
                 },
                 "location": {
                     "required": null,
                     "optional": [
                             "US",
                             "01",
                             "02",
                             "04",
                             "05",
                             "06",
                             "08",
                             "09",
                             "10",
                             "11",
                             "12",
                             "13",
                             "15",
                             "16",
                             "17",
                             "18",
                             "19",
                             "20",
                             "21",
                             "22",
                             "23",
                             "24",
                             "25",
                             "26",
                             "27",
                             "28",
                             "29",
                             "30",
                             "31",
                             "32",
                             "33",
                             "34",
                             "35",
                             "36",
                             "37",
                             "38",
                             "39",
                             "40",
                             "41",
                             "42",
                             "44",
                             "45",
                             "46",
                             "47",
                             "48",
                             "49",
                             "50",
                             "51",
                             "53",
                             "54",
                             "55",
                             "56",
                             "72",
                             "78"
                         ]
                 },
                 "age_group":{
                     "required":["0-130"],
                     "optional":["0-0.99","1-4","5-17","5-64","18-49","50-64","65-130"]
                 }
                },
                "output_type": {
                    "cdf":{
                        "output_type_id":{
                            "required":null,
                            "optional":null
                        },
                        "value":{
                            "type":"double",
                            "minimum":0,
                            "maximum":1
                        }
                    }
                },
                "target_metadata": [
                 {
                    "target_id": "peak time hosp",
                    "target_name": "Peak timing of hospitalization",
                    "target_units": "population",
                    "target_keys": {
                        "target": ["peak time hosp"]
                    },
                    "target_type": "discrete",
                    "is_step_ahead": true,
                    "time_unit": "week"
                 }
                ]
            }
        ],
        "submissions_due": {
            "relative_to": "origin_date",
            "start": -6,
            "end": 100
        }
    }
    ]
}
kjsato commented 4 months ago

Thanks @annakrystalli It is very helpful!

Hi Koji. OK I looked into it and the problem is arising in the now hubData function expand_model_out_val_grid which produces a grid of valid value combinations. Currently we are not allowing both required and optional properties to be null in the tasks.json config file and seems you have that issue in three places in yours, twice when specifying horizon and also once when specifying cdf output types at the bottom. When I add a value in the optional properties of all three properties, the validation proceeds as expected:

validate_submission(hub_path=".",file_path="teamsam-modelple/2023-11-26-teamsam-modelple.parquet")
✔ sample: All hub config files are valid.
✔ 2023-11-26-teamsam-modelple.parquet: File exists at path model-output/teamsam-modelple/2023-11-26-teamsam-modelple.parquet.
✔ 2023-11-26-teamsam-modelple.parquet: File name "2023-11-26-teamsam-modelple.parquet" is valid.
✔ 2023-11-26-teamsam-modelple.parquet: File directory name matches `model_id` metadata in file name.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id` is valid.
✔ 2023-11-26-teamsam-modelple.parquet: File is accepted hub format.
✔ 2023-11-26-teamsam-modelple.parquet: Metadata file exists at path model-metadata/teamsam-modelple.yaml.
✔ 2023-11-26-teamsam-modelple.parquet: File could be read successfully.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id_col` name is valid.
✔ 2023-11-26-teamsam-modelple.parquet: `round_id` column "origin_date" contains a single, unique round ID value.
✔ 2023-11-26-teamsam-modelple.parquet: All `round_id_col` "origin_date" values match submission `round_id` from file name.
✔ 2023-11-26-teamsam-modelple.parquet: Column names are consistent with expected round task IDs and std column names.
✔ 2023-11-26-teamsam-modelple.parquet: Column data types match hub schema.
✔ 2023-11-26-teamsam-modelple.parquet: `tbl` contains valid values/value combinations.
✔ 2023-11-26-teamsam-modelple.parquet: All combinations of task ID column/`output_type`/`output_type_id` values are unique.
✔ 2023-11-26-teamsam-modelple.parquet: Required task ID/output type/output type ID combinations all present.
✔ 2023-11-26-teamsam-modelple.parquet: Values in column `value` all valid with respect to modeling task config.
✔ 2023-11-26-teamsam-modelple.parquet: Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID value/output type
  combinations of quantile or cdf output types.
ℹ 2023-11-26-teamsam-modelple.parquet: No pmf output types to check for sum of 1. Check skipped.
✔ 2023-11-26-teamsam-modelple.parquet: Submission time is within accepted submission window for round.
annakrystalli commented 4 months ago

Hi @kjsato , I'm posting a link to comment in a related issue: https://github.com/Infectious-Disease-Modeling-Hubs/hubAdmin/issues/4#issuecomment-1973207096

Probably best to follow conversations there but for a quick fix, you could try using ["NA"] in e.g. the optional property of horizon instead of null. I think you would definitely need some sort of value in the cdf output_type_id specification or to remove the particular output type for the modeling task where it's not required all together.

Note the above assumes that if someone submits forecasts for the modeling task that does not require a horizon value that the value in the horizon column of the file be NA in the relevant rows.

Note as well this hasn't been fully tested yet so until it's agreed on as something we're supporting and tested there may be unexpected behaviour. PLease feel free to report any such behaviour if you encounter it 👍

kjsato commented 4 months ago

Hi @annakrystalli , thanks for your advice. I'll follow them

kjsato commented 3 months ago

@annakrystalli sorry for being late to inform that the update was okay on testing on horizon, thanks

annakrystalli commented 3 months ago

Thank for letting me know @kjsato !!