hubverse-org / hubAdmin

Utilities for administering hubverse Infectious Disease Modeling Hubs
https://hubverse-org.github.io/hubAdmin/
Other
1 stars 2 forks source link

Validate that task ID data type is consistent across rounds #28

Open annakrystalli opened 1 month ago

annakrystalli commented 1 month ago

For a hub to be successfully accessed as an arrow dataset, column data types should not change from round to round. Generally many task IDs that are covered by our schema shouldn't change data type in further rounds as that's somewhat fixed by the schema. However:

  1. there are task IDs that accept more than one data type
  2. Custom task IDs which are beyond our control

have the potential to vary between modeling tasks/rounds and change over time and this could indeed cause problems downstream. This is mainly a problem for parquet files (but has a small chance to cause problems in csvs too).

Dynamic check for more than one data type in task ID columns

Develop a dynamic config level validation check that: