Our current schema suggests that some task id values might accept multiple data types, e.g. scenario_id lists both "integer" and "string"here.
I had some questions about this:
How are we using these data types?
It seems possible/likely that this is only being used to validate that the hub has specified its tasks.json file correctly.
If I set up a hub where the scenario_id has an integer data type, will that column of model outputs be converted to an integer when I read in some model outputs? Would we want to do this?
If I set up a hub where the scenario_id has integer data type, do we expect validations to throw an error if a model submission has encoded the value as "1" instead of 1, either via a data type specification in a parquet file or (possibly?) via quoting values in a csv file? Would we want to do this? (I think the quoting values in csv files is pretty iffy, that's not really a data type specification so much as a csv formatting thing...)
I am basically wondering if the data type that shows up in a hub's tasks.json file for values of task id variables matters or should matter for downstream processing?
Our current schema suggests that some task id values might accept multiple data types, e.g.
scenario_id
lists both"integer"
and"string"
here.I had some questions about this:
scenario_id
has an integer data type, will that column of model outputs be converted to an integer when I read in some model outputs? Would we want to do this?scenario_id
has integer data type, do we expect validations to throw an error if a model submission has encoded the value as"1"
instead of1
, either via a data type specification in a parquet file or (possibly?) via quoting values in a csv file? Would we want to do this? (I think the quoting values in csv files is pretty iffy, that's not really a data type specification so much as a csv formatting thing...)I am basically wondering if the data type that shows up in a hub's tasks.json file for values of task id variables matters or should matter for downstream processing?