stac-extensions / ml-model

An Item and Collection extension to describe machine learning (ML) models that operate on Earth observation data.
Apache License 2.0
37 stars 0 forks source link

Model types #12

Open pieschker opened 2 years ago

pieschker commented 2 years ago

The main model spec defines a list of allowed Model Type values that are designed to capture the type of model in more detail than than the Algorithm Type.

Opening this issue to discuss the following:

originally posted by @duckontheweb

Comments

Is "Model Type" is the appropriate name for this field?

It seems reasonable! "Target type" might be another appropriate name.

Two thoughts here:

Multi-task models do not fit well. For example, models trained on the xBD dataset (https://arxiv.org/pdf/1911.09296.pdf) might need to segment buildings and predict image level damage labels. Here a model might be segmentation and classification! Unsupervised models that use some sort of contrastive loss also do not fit well. For example, if an unsupervised model uses a triplet loss what should its "Model type" be?


Good points Caleb.

For the Multi-task models, the solution can be to allow "Model Type" to have a list of types. For the unsupervised example, doesn't fall under "Dimensionality Reduction"? We can also add "Embedding", but it's kind of the same thing. Also note that the value for "Model Type" has a set of pre-defined values but for edge cases like this modeler can use/suggest a different type.


For the unsupervised example, doesn't fall under "Dimensionality Reduction"? https://github.com/radiantearth/geo-ml-model-catalog/pull/21 changed the model_type field to a prediction_type field in the model_type object. We don't currently have "Dimensionality Reduction" listed as an option in the "Prediction Type" section, but maybe we should. @calebrob6 if we add this, it seems like your suggestion of using target_type might be more appropriate than using prediction_type since dimensionality reduction doesn't really represent a "prediction."


In computer vision, The term task, instead of model_type, is more usual in the ML field, regarding the the listed values, we can sometimes find also the term Instance detection / Instance segmentation.:

rbavery commented 1 year ago

Re-upping this!

I think "task(s)" is better given that more models are being released as multi-modal, or just the weights for a baseline model are released and different heads can be attached for different tasks.

Maybe the acceptable types for this field should then also change from a string to either a string or a list of strings?