stac-extensions / ml-model

An Item and Collection extension to describe machine learning (ML) models that operate on Earth observation data.
Apache License 2.0
37 stars 0 forks source link

Model "file type"? #5

Open m-mohr opened 2 years ago

m-mohr commented 2 years ago

Disclaimer: I'm not so much into ML models, but have a use case ;-)

So we will have several providers that aim to implement this extension for models they generate with specific software in their infrastructure (e.g. random forest). Other providers may then read these results. I've been told that these models depending on which software they may have been generated with, may generate different types of model files so that only some software may be able to read it and some others may not, for example:

How can I know from the model metadata whether I can read the exposed model file with my software? @duckontheweb Maybe this is easy to answer and may just be reading a different media type or so, but want to ensure this is considered. :-)

Related issue: https://github.com/Open-EO/openeo-processes/issues/300

m-mohr commented 2 years ago

Thoughts @duckontheweb ?

duckontheweb commented 2 years ago

Sorry I missed this the first time around @m-mohr!

I think the easiest way to handle this is through some combination of media types and roles or relation types (depending on whether we are dealing with an Asset or a Link). In some ways, using media types would be preferable because it would work for both Assets and Links. However, it seems like most model artifacts do not have an official IANA media type, so we would have to define our own within the spec.

I recently added the "ml-model:checkpoint" role to handle the case of PyTorch checkpoint files as assets, but it seems like if we continue to take this approach we would need define a new role for each type of model file, which could be cumbersome.

I will put some more thought into this, but I'm curious if others have any insight into a better approach.

m-mohr commented 2 years ago

I just had the idea to use "processing:software" (on assets?) to specify the software writing it, but it could also be additional roles or media types. I guess we need to investigate this a bit more, we will probably also experiment with it in openEO Platform and see what works for us and propose that as a potential solution.