Closed duckontheweb closed 3 years ago
@Geoyi does this seem like the right way to define these? Also see #12 for the more specific model type definitions.
@drewbo @batic @calebrob6 Let me know if you have any feedback on this or #12
Perhaps "Weakly supervised" and "Self-supervised" would also make sense here?
Related, is there a part of the spec where an author can describe what is going on in the "Algorithm Type" / other fields? E.g. if I fine-tune a base, unsupervised model (e.g. trained with Tile2Vec) on a task using limited data, I might call the resulting model "Supervised", but would want to explain that (and reference the model_id
of the base model).
For fine-tuning we may need to add a new optional fragment for "base model" or similar name.
Perhaps "Weakly supervised" and "Self-supervised" would also make sense here?
This feels like we are getting into a proper taxonomy of the approaches... If we don't want to do that, perhaps using a lowest common denominator would be better.
Yeah great point! A lowest common denominator classification of algorithm types would lead to less confusion (as users might not agree on a broader taxonomy).
Related, is there a part of the spec where an author can describe what is going on in the "Algorithm Type" / other fields? E.g. if I fine-tune a base, unsupervised model (e.g. trained with Tile2Vec) on a task using limited data, I might call the resulting model "Supervised", but would want to explain that (and reference the
model_id
of the base model).
We don't have anything in the spec for this right now, but I like the suggestion. I'll get a PR in that adds. I can think of a couple ways to handle this, let me know which you prefer:
Add a top-level description
field
This would be a free text field that allows the publisher to add a human-readable description of the model. This could include notes on the model architecture, model type, and really anything that the publisher deems relevant.
Move model_type
and algorithm_type
into a type
object field that also includes an optional description
. Something like:
"type": {
"algorithm_type": "Supervised",
"model_type": "Classification",
"description": "<Free_text_description_here>"
}
I tend to lean towards the second option even though it's a bit more complex because it would be more obvious that there are additional notes associated with the model/algorithm type fields. It might not be obvious to users that they should look through the general description to find notes about the model/algorithm types.
I like second option as well, although now, with both algorithm_type
and model_type
in the same nested level, the algorithm_type
is even more "bugging" me than before. Perhaps approach
? (This might be personal preference, so feel free to ignore completely).
the
algorithm_type
is even more "bugging" me than before. Perhapsapproach
?
Yeah, algorithm_type
doesn't quite seem like the right term to me either and having the _type
suffix is sort of redundant in this new structure.
It seems like this field is really capturing the general approach to learning, so maybe one of these?
approach
technique
learning_approach
learning_technique
Closing this via #21 . If other learning approaches need to be added we can open specific issues for those.
The main model spec defines a list of allowed Algorithm Type values that are designed to capture the high-level type of the model.
Opening this issue to discuss: