radiantearth / geo-ml-model-catalog

Geospatial ML Model Catalog Spec
Apache License 2.0
52 stars 8 forks source link

Algorithm types #11

Closed duckontheweb closed 3 years ago

duckontheweb commented 3 years ago

The main model spec defines a list of allowed Algorithm Type values that are designed to capture the high-level type of the model.

Opening this issue to discuss:

duckontheweb commented 3 years ago

@Geoyi does this seem like the right way to define these? Also see #12 for the more specific model type definitions.

duckontheweb commented 3 years ago

@drewbo @batic @calebrob6 Let me know if you have any feedback on this or #12

calebrob6 commented 3 years ago

Perhaps "Weakly supervised" and "Self-supervised" would also make sense here?

Related, is there a part of the spec where an author can describe what is going on in the "Algorithm Type" / other fields? E.g. if I fine-tune a base, unsupervised model (e.g. trained with Tile2Vec) on a task using limited data, I might call the resulting model "Supervised", but would want to explain that (and reference the model_id of the base model).

HamedAlemo commented 3 years ago

For fine-tuning we may need to add a new optional fragment for "base model" or similar name.

batic commented 3 years ago

Perhaps "Weakly supervised" and "Self-supervised" would also make sense here?

This feels like we are getting into a proper taxonomy of the approaches... If we don't want to do that, perhaps using a lowest common denominator would be better.

calebrob6 commented 3 years ago

Yeah great point! A lowest common denominator classification of algorithm types would lead to less confusion (as users might not agree on a broader taxonomy).

duckontheweb commented 3 years ago

Related, is there a part of the spec where an author can describe what is going on in the "Algorithm Type" / other fields? E.g. if I fine-tune a base, unsupervised model (e.g. trained with Tile2Vec) on a task using limited data, I might call the resulting model "Supervised", but would want to explain that (and reference the model_id of the base model).

We don't have anything in the spec for this right now, but I like the suggestion. I'll get a PR in that adds. I can think of a couple ways to handle this, let me know which you prefer:

  1. Add a top-level description field

    This would be a free text field that allows the publisher to add a human-readable description of the model. This could include notes on the model architecture, model type, and really anything that the publisher deems relevant.

  2. Move model_type and algorithm_type into a type object field that also includes an optional description. Something like:

    "type": {
       "algorithm_type": "Supervised",
       "model_type": "Classification",
       "description": "<Free_text_description_here>"
    }

I tend to lean towards the second option even though it's a bit more complex because it would be more obvious that there are additional notes associated with the model/algorithm type fields. It might not be obvious to users that they should look through the general description to find notes about the model/algorithm types.

batic commented 3 years ago

I like second option as well, although now, with both algorithm_type and model_type in the same nested level, the algorithm_type is even more "bugging" me than before. Perhaps approach? (This might be personal preference, so feel free to ignore completely).

duckontheweb commented 3 years ago

the algorithm_type is even more "bugging" me than before. Perhaps approach?

Yeah, algorithm_type doesn't quite seem like the right term to me either and having the _type suffix is sort of redundant in this new structure.

It seems like this field is really capturing the general approach to learning, so maybe one of these?

duckontheweb commented 3 years ago

Closing this via #21 . If other learning approaches need to be added we can open specific issues for those.