Algorithm types - Githubissues

duckontheweb commented 3 years ago

The main model spec defines a list of allowed Algorithm Type values that are designed to capture the high-level type of the model.

Opening this issue to discuss:

Whether "Algorithm Type" is the appropriate name for this field
Whether there should be changes to the listed values
Whether we should allow other, user-defined values or require one of the listed values

duckontheweb commented 3 years ago

@Geoyi does this seem like the right way to define these? Also see #12 for the more specific model type definitions.

duckontheweb commented 3 years ago

@drewbo @batic @calebrob6 Let me know if you have any feedback on this or #12

calebrob6 commented 3 years ago

Perhaps "Weakly supervised" and "Self-supervised" would also make sense here?

Related, is there a part of the spec where an author can describe what is going on in the "Algorithm Type" / other fields? E.g. if I fine-tune a base, unsupervised model (e.g. trained with Tile2Vec) on a task using limited data, I might call the resulting model "Supervised", but would want to explain that (and reference the model_id of the base model).

HamedAlemo commented 3 years ago

For fine-tuning we may need to add a new optional fragment for "base model" or similar name.

batic commented 3 years ago

Perhaps "Weakly supervised" and "Self-supervised" would also make sense here?

This feels like we are getting into a proper taxonomy of the approaches... If we don't want to do that, perhaps using a lowest common denominator would be better.

calebrob6 commented 3 years ago

Yeah great point! A lowest common denominator classification of algorithm types would lead to less confusion (as users might not agree on a broader taxonomy).

duckontheweb commented 3 years ago

Related, is there a part of the spec where an author can describe what is going on in the "Algorithm Type" / other fields? E.g. if I fine-tune a base, unsupervised model (e.g. trained with Tile2Vec) on a task using limited data, I might call the resulting model "Supervised", but would want to explain that (and reference the model_id of the base model).

We don't have anything in the spec for this right now, but I like the suggestion. I'll get a PR in that adds. I can think of a couple ways to handle this, let me know which you prefer:

Add a top-level description field

This would be a free text field that allows the publisher to add a human-readable description of the model. This could include notes on the model architecture, model type, and really anything that the publisher deems relevant.

Move model_type and algorithm_type into a type object field that also includes an optional description. Something like:

"type": {
   "algorithm_type": "Supervised",
   "model_type": "Classification",
   "description": "<Free_text_description_here>"
}

I tend to lean towards the second option even though it's a bit more complex because it would be more obvious that there are additional notes associated with the model/algorithm type fields. It might not be obvious to users that they should look through the general description to find notes about the model/algorithm types.

batic commented 3 years ago

I like second option as well, although now, with both algorithm_type and model_type in the same nested level, the algorithm_type is even more "bugging" me than before. Perhaps approach? (This might be personal preference, so feel free to ignore completely).

duckontheweb commented 3 years ago

the algorithm_type is even more "bugging" me than before. Perhaps approach?

Yeah, algorithm_type doesn't quite seem like the right term to me either and having the _type suffix is sort of redundant in this new structure.

It seems like this field is really capturing the general approach to learning, so maybe one of these?

approach
technique
learning_approach
learning_technique

duckontheweb commented 3 years ago

Closing this via #21 . If other learning approaches need to be added we can open specific issues for those.

radiantearth / geo-ml-model-catalog

Algorithm types #11