hubverse-org / schemas

JSON schemas for modeling hubs
Creative Commons Zero v1.0 Universal
4 stars 2 forks source link

Define data types for sample `value` minimum and maximum properties #37

Open annakrystalli opened 1 year ago

annakrystalli commented 1 year ago

Working and v1.0.0 I noticed that the schema for sample value minimum and maximum properties only has a description with no type or other specification.

https://github.com/Infectious-Disease-Modeling-Hubs/schemas/blob/c73b02de6dc44af4251c90a239e47505e7a40f91/v1.0.0/tasks-schema.json#L946-L966

Could I get some clarity on what the value column for sample output types will hold? Is it target dependant?

nickreich commented 1 year ago

I'm going to answer your question with ones of my own.

Doesn't the enum array specify that the types will be either double or integer?

I guess in practice, I could envision for a categorical target there being "samples" that would be the character strings of one of the category labels. But it seems like here the vision has been that samples would only be numeric and the minimum and maximum would specify the range of allowable values.

More broadly, I wonder if there shouldn't be some fields in the target_metadata that specifies something like the allowable range of the variable, which might depend on the target type? E.g. it seems that this would be the place to specify a numeric range for a discrete or continuous target, or allowable categories for a categorical target?

nickreich commented 1 year ago

Leaving related documentation about target types and output types from Zoltar documentation: https://docs.zoltardata.com/targets/#valid-prediction-types-by-target-type

nickreich commented 1 year ago

A few more thoughts on this.

annakrystalli commented 1 year ago

I think this is worth a chat at the next meeting as suggested on slack. What I'd particularly like to better understand (and document in hubDocs) is when one should consider introducing a new modeling task item.

On another note, the zoltar documentation has a lot of great content on targets that could be a great source for some more detailed documentation on targets in hubDocs.

annakrystalli commented 1 year ago

Regarding the original question, if we need flexibility for this particular argument, perhaps we could just add the additionalProperties: true keyword in the schema and provide instructions in the docs for how to encode checks for samples of different output_types in tasks.json files.