Open annakrystalli opened 1 year ago
I'm going to answer your question with ones of my own.
Doesn't the enum array specify that the types will be either double or integer?
I guess in practice, I could envision for a categorical target there being "samples" that would be the character strings of one of the category labels. But it seems like here the vision has been that samples would only be numeric and the minimum and maximum would specify the range of allowable values.
More broadly, I wonder if there shouldn't be some fields in the target_metadata that specifies something like the allowable range of the variable, which might depend on the target type? E.g. it seems that this would be the place to specify a numeric range for a discrete or continuous target, or allowable categories for a categorical target?
Leaving related documentation about target types and output types from Zoltar documentation: https://docs.zoltardata.com/targets/#valid-prediction-types-by-target-type
A few more thoughts on this.
model_tasks > target_metadata > target_type
field one target is "nominal" and one is "continuous". Then, if you want them defined in the same task_id
block they have to have the same output_types which might be constraining. E.g. you could maybe have a "mode" output type for both of them and maybe a "sample" (although I note that sample does not currently accept "character" data types), but other output_types, e.g. "mean" or "cdf" would be meaningless for the "nominal" target. So in a setting where you wanted or needed different output types for two different targets you'd need to make separate task_ids for them. I don't think this is necessarily a bad thing, but just wanted to say it out loud.I think this is worth a chat at the next meeting as suggested on slack. What I'd particularly like to better understand (and document in hubDocs
) is when one should consider introducing a new modeling task item.
On another note, the zoltar documentation has a lot of great content on targets that could be a great source for some more detailed documentation on targets in hubDocs
.
Regarding the original question, if we need flexibility for this particular argument, perhaps we could just add the additionalProperties: true
keyword in the schema and provide instructions in the docs for how to encode checks for samples of different output_types in tasks.json files.
Working and v1.0.0 I noticed that the schema for
sample
value
minimum
andmaximum
properties only has a description with no type or other specification.https://github.com/Infectious-Disease-Modeling-Hubs/schemas/blob/c73b02de6dc44af4251c90a239e47505e7a40f91/v1.0.0/tasks-schema.json#L946-L966
Could I get some clarity on what the value column for sample output types will hold? Is it target dependant?