hubverse-org / hubAdmin

Utilities for administering hubverse Infectious Disease Modeling Hubs
https://hubverse-org.github.io/hubAdmin/
Other
1 stars 2 forks source link

Write output of `create_rounds()` in JSON file #3

Closed LucieContamin closed 2 weeks ago

LucieContamin commented 10 months ago

Is there a good way to write the create_rounds() output in a JSON file format that does not returns error because of formatting?

See example below

library(hubUtils)

rounds <- create_rounds(
  create_round(
    round_id_from_variable = TRUE,
    round_id = "origin_date",
    model_tasks = create_model_tasks(
      create_model_task(
        task_ids = create_task_ids(
          create_task_id("origin_date",
                         required = NULL,
                         optional = c(
                           "2023-01-02",
                           "2023-01-09",
                           "2023-01-16"
                         )
          ),
          create_task_id("location",
                         required = "US",
                         optional = c("01", "02", "04", "05", "06")
          ),
          create_task_id("horizon",
                         required = 1L,
                         optional = 2:4
          )
        ),
        output_type = create_output_type(
          create_output_type_mean(
            is_required = TRUE,
            value_type = "double",
            value_minimum = 0L
          )
        ),
        target_metadata = create_target_metadata(
          create_target_metadata_item(
            target_id = "inc hosp",
            target_name = "Weekly incident influenza hospitalizations",
            target_units = "rate per 100,000 population",
            target_keys = NULL,
            target_type = "discrete",
            is_step_ahead = TRUE,
            time_unit = "week"
          )
        )
      )
    ),
    submissions_due = list(
      relative_to = "origin_date",
      start = -4L,
      end = 2L
    )
  )
)
jsonlite::write_json(create_config(rounds), 
                     "./hub-config/tasks.json")

task_err <- validate_config("./")
#> Loading required namespace: jsonvalidate
#> Warning: Schema errors detected in config file
#> './hub-config/tasks.json' validated
#> against schema
#> <https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v2.0.0/tasks-schema.json>
view_config_val_errors(task_err)
hubUtils config validation error report
Report for file ./hub-config/tasks.json using schema version v2.0.0
Error location Schema details Config
instancePath schemaPath keyword message schema data

schema_version

properties └schema_version └─type

type ❌ must be string

string

https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v2.0.0/tasks-schema.json

rounds1 └─round_id_from_variable

properties └rounds └─items └──properties └───round_id_from_variable └────type

type ❌ must be boolean

boolean

TRUE

rounds1 └─round_id

properties └rounds └─items └──properties └───round_id └────type

type ❌ must be string

string

origin_date

rounds1 └─model_tasks └──1 └───task_ids └────origin_date └─────required

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────task_ids └───────properties └────────origin_date └─────────properties └──────────required └───────────type

type ❌ must be array,null

array, null

rounds1 └─model_tasks └──1 └───output_type └────mean └─────output_type_id

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────output_type └───────properties └────────mean └─────────properties └──────────output_type_id └───────────oneOf

oneOf ❌ must match exactly one schema in oneOf

1 required-description: When mean is required, property set to single element ‘NA’ array required-type: array required-items-const:‘NA’ required-items-maxItems: 1 optional-description: When mean is required, property set to null optional-type: null

2 required-description: When mean is optional, property set to null required-type: null optional-description: When mean is optional, property set to single element ‘NA’ array optional-type: array optional-items-const:‘NA’ optional-items-maxItems: 1

required: NA

rounds1 └─model_tasks └──1 └───output_type └────mean └─────value └──────type

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────output_type └───────properties └────────mean └─────────properties └──────────value └───────────properties └────────────type └─────────────type

type ❌ must be string

string

double

rounds1 └─model_tasks └──1 └───output_type └────mean └─────value └──────minimum

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────output_type └───────properties └────────mean └─────────properties └──────────value └───────────properties └────────────minimum └─────────────type

type ❌ must be number,integer

number, integer

0

rounds1 └─model_tasks └──1 └───target_metadata └────1 └─────target_id

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────target_metadata └───────items └────────properties └─────────target_id └──────────type

type ❌ must be string

string

inc hosp

rounds1 └─model_tasks └──1 └───target_metadata └────1 └─────target_name

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────target_metadata └───────items └────────properties └─────────target_name └──────────type

type ❌ must be string

string

Weekly incident influenza hospitalizations

rounds1 └─model_tasks └──1 └───target_metadata └────1 └─────target_units

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────target_metadata └───────items └────────properties └─────────target_units └──────────type

type ❌ must be string

string

rate per 100,000 population

rounds1 └─model_tasks └──1 └───target_metadata └────1 └─────target_type

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────target_metadata └───────items └────────properties └─────────target_type └──────────type

type ❌ must be string

string

discrete

rounds1 └─model_tasks └──1 └───target_metadata └────1 └─────is_step_ahead

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────target_metadata └───────items └────────properties └─────────is_step_ahead └──────────type

type ❌ must be boolean

boolean

TRUE

rounds1 └─model_tasks └──1 └───target_metadata └────1 └─────time_unit

properties └rounds └─items └──properties └───model_tasks └────items └─────properties └──────target_metadata └───────items └────────properties └─────────time_unit └──────────type

type ❌ must be string

string

week

rounds1 └─submissions_due

properties └rounds └─items └──properties └───submissions_due └────oneOf

oneOf ❌ must match exactly one schema in oneOf

1 relative_to-description: Name of task id variable in relation to which submission start and end dates are calculated. relative_to-type: string start-description: Difference in days between start and origin date. start-type: integer start-format:‘NA’ end-description: Difference in days between end and origin date. end-type: integer end-format:‘NA’ required1: relative_to required2: start required3: end

2 relative_to-description:‘NA’ relative_to-type:‘NA’ start-description: Submission start date. start-type: string start-format: date end-description: Submission end date. end-type: string end-format: date required1: start required2: end

relative_to: origin_date, start: -4, end: 2
For more information, please consult the hubDocs documentation.

Created on 2023-11-03 with reprex v2.0.2

annakrystalli commented 10 months ago

Unfortunately this whole part of work is stuck on this issue, the fact that it is not at all straightforward to write valid JSON from R primarily because R has no concept of scalars Vs arrays, all are vectors.

Some exploration of using the schema and jsonvalidate::json_serialise to determine what should be a scalar and what an array on JSON was attempted but that also threw unexpected errors with our configs/schemas. If this is to work eventually it will likely need non trivial upstream contribution to the jsonvalidate package but given other pressing priorities and that initial attempts showed the problem to be quite complex not sure when it might be tackled.

LucieContamin commented 10 months ago

Yes that what I was wondering, I think it's OK to manually fix the JSON for now. I agree that it's not a priority. I was asking in case you had a quick solution, I will let you know if I found a quick solution!

annakrystalli commented 10 months ago

So gutted that I don't have a better solution for you apart from indeed manually fixing the output 😫. I really hope we manage to resolve it at some point given the effort that went into the rest of the functionality. It just ended up a much more complex problem than I anticipated! 😭

annakrystalli commented 10 months ago

If you do find a solution let me know for sure!!

annakrystalli commented 3 weeks ago

Adding context from PR here:

In PR (#30) I added a write_config() function for writing out config class objects of programmatically created task configurations to JSON files.

It's been put off for some time because it's almost impossible to write a valid config file due to inconsistencies between R and JSON data types, in particular the fact that R has no concept of a scalar. As such some properties in the output file will likely not conform to schema expectations. They might be an ⁠⁠ when a ⁠⁠ is required or vice versa.

I tried to make use of the jsonvalidate::json_serialise() functionality, which can help serialise R objects in a schema aware way but I gave up because:

  1. I'm getting the following error when trying to run with our schema and config that I've not been able to debug, even with the help of chatGPT on jsonvalidate javascript source code.
    Error: TypeError: Cannot convert undefined or null to object
  2. The jsonvalidate::json_serialise() has been for some time now (since last year) only available in the development version of jsonvalidate with no indication when it might be pushed to CRAN.
  3. Ultimately, it would not be 100% guaranteed correct because json_serialise() cannot make use of oneOf statements which we use in our schema.
See code ``` r library(hubAdmin) rounds <- create_rounds( create_round( round_id_from_variable = TRUE, round_id = "origin_date", model_tasks = create_model_tasks( create_model_task( task_ids = create_task_ids( create_task_id("origin_date", required = NULL, optional = c( "2023-01-02", "2023-01-09", "2023-01-16" ) ), create_task_id("location", required = "US", optional = c("01", "02", "04", "05", "06") ), create_task_id("horizon", required = 1L, optional = 2:4 ) ), output_type = create_output_type( create_output_type_mean( is_required = TRUE, value_type = "double", value_minimum = 0L ) ), target_metadata = create_target_metadata( create_target_metadata_item( target_id = "inc hosp", target_name = "Weekly incident influenza hospitalizations", target_units = "rate per 100,000 population", target_keys = NULL, target_type = "discrete", is_step_ahead = TRUE, time_unit = "week" ) ) ) ), submissions_due = list( relative_to = "origin_date", start = -4L, end = 2L ) ) ) schema <- hubAdmin:::download_tasks_schema(format = "json") config <- create_config(rounds) jsonvalidate::json_serialise(config, schema) #> Error in eval(expr, envir, enclos): TypeError: Cannot convert undefined or null to object ```

As such, I've added a warning in the docs about this as well as issuing a message on write, advising users of the pitfalls and directing to use validate_config() to validate the config files written out and identify any deviations.