Revisiting the ML Model extension

rbavery commented 1 year ago

I'm opening this ticket to gauge interest in the ML Model extension and see if others find value in updating this extension given many recent developments in ML, namely:

a greater diversity of tasks, learning approaches, and asset types associated with ML models
a greater diversity of compute architectures used to train and run inference (AMD, TPU, NVIDIA GPU, Apple Metal).
Is "OS" a relevant field given this extension uses Docker?
The proliferation of ML models in the earth observation industry and we currently don't have a popular metadata standard for indexing and search of geospatial ML models

Are there plans to revamp and maintain something like this in the future as a part of the transition from Radiant Earth MLHub > Source Cooperative? @kbgg

kbgg commented 1 year ago

Hey @rbavery, we don't have any plans to continue development of this extension but are more than happy to hand this over to you as a maintainer if you'd like!

weiji14 commented 1 year ago

Hi @kbgg, I'd be keen to help with maintenance on this STAC extension on behalf of DevSeed. We've discussed this internally that there is a lot of potential with this ml-model extension, and would love to take this into the next stage. If you could add my username @weiji14 to the repo, that would be great.

Also happy to share maintenance rights with other folks who are interested. I can mention this in the Pangeo ML Working Group meeting next month to see who else may be interested.

rbavery commented 1 year ago

I'm interested in helping with maintenance as well, my handle is @rbavery. I have some initial work on a pystac ml-model extension that I'd like to propose in conjunction with updates to this spec when it is ready.

kbgg commented 1 year ago

I don't think I have permissions to add anyone to the repository, @m-mohr?

HamedAlemo commented 1 year ago

It's great to see growing interest in this! It turns out I'm still the admin on this repo. I added you all, and @kbgg I also made you an admin.

@rbavery @weiji14 we will be happy to collaborate on this, but more on the user side to test new versions of the metadata for models.

PondiB commented 1 year ago

@rbavery and @weiji14, are there any ongoing behind-the-scenes conversations? @HamedAlemo, please include me as well, as I would like to help in maintenance as well, my handle is @PondiB .

rbavery commented 1 year ago

Hi @PondiB ! I've recently moved jobs and am focusing on all things geo ML at wherobots.ai . I may contribute back to this, but I'm not a part of any behind the scenes conversations right now.

To get conversations going in the open, I created a public slack channel in the Cloud Native Geo Foundation Slack that anyone is welcome to join: https://join.slack.com/t/cloudnativegeo/shared_invite/zt-235w8flfo-TW5Tpi1sPqQFWeMy~7ROHA

PondiB commented 1 year ago

@rbavery, well noted, and thanks for the Slack channel link.

rbavery commented 11 months ago

Hi all, I spoke with @fmigneault-crim about another ml extension project he and others built: https://github.com/crim-ca/dlm-extension

it's farther along and more up to date than this current repo. I suggest we archive this repo in favor of the https://github.com/crim-ca/dlm-extension repo . If https://github.com/crim-ca/dlm-extension gains some more adoption a next step would be to move it ti the stac-extensions org. I think the next Maturity threshold is 3 organizations using it. I'm planning to build a validation library with Pydantic v2 for the DLM extension and use it to track models at Wherobots.ai.

m-mohr commented 11 months ago

Can the other repo enable the issue tracker? I have a couple of comments ;)

PondiB commented 11 months ago

@rbavery, I appreciate the reference to the repo currently under development by @fmigneault-crim and the team. I will look into it in detail. However, I have a question regarding its primary focus on "Deep Learning". Is the strategic initiative to extend its scope to encompass pixel-based machine learning models? If yes, then I think it should be renamed.

fmigneault commented 11 months ago

@m-mohr done

fmigneault commented 11 months ago

@PondiB The definitions are made to allow pixel-based ML models (I would like further validation with actual use cases however, if some can be provided to add any missing/relevant fields). The definitions are generic to allow other model variations as well, such as ROI classification/detection. The "Deep Learning" name was chosen to avoid confusion with ML-model (https://github.com/stac-extensions/ml-model) which was already taken.

m-mohr commented 11 months ago

@fmigneault Would it make sense to copy over the DLM extension content into the ml-model repository and release it as a 2.0.0 eventually?

fmigneault commented 11 months ago

That could make sense. Is there any plan to keep using ml-model by other projects? I have not evaluated if everything in current ml-model can be entirely ported to dlm definitions.

HamedAlemo commented 11 months ago

I agree with replacing the existing version with the DLM extension and maybe using the same name ml-model so it's generic and inclusive. DLM as it is now is more up to date for sure. I don't think any organization uses ml-model actively now. We generated this for models that were hosted on Radiant MLHub but with the transition to Source Cooperative no models are cataloged anymore. cc @kbgg .

devisperessutti commented 10 months ago

We are very interested in this discussion and hope we can contribute to take any extension further.

We reviewed the currently available STAC-ML specifications to compare them and find main strengths/limitations. Report available here. We also have found the DLM to be more up-to-date and complete, but had doubts about its generalization to any machine learning method, rather than deep learning.

We have created two STAC items, one for a DL and one for a LightGBM model (as we couldn't find complete examples). For the pixel-based GBM, the final_layer_size required field in dlm:outputs doesn't apply, while most of the other operators could be applied (although the dlm:architecture and dlm:inputs required operators might be too strict).

rbavery commented 10 months ago

That's great feedback @devisperessutti thanks! I've worked on revamping the DLM to be more general and not bake in required fields that are particular to deep learning or a specific ML framework.

It still has marked optional fields that cater to the deep learning/computer vision community since I expect they will need fields similar to final_layer_size.

Any feedback on this Pr or the associated hackmd doc would be super valuable to pushing this extension forward: https://github.com/crim-ca/dlm-extension/pull/2 https://hackmd.io/DBRF1sQCS1WmSqygJNKQJQ?view

Right now we're seeking comment and looking to resolve issues around how this extension should be referenced, how nested should extension objects be, and other comments.

Once a critical mass of folks are aligned on the extension, we can bring it into this repository and highlight many community examples.

fmariv commented 9 months ago

From @earthpulse we are also very interested in this discussion and we would be more than happy to get involved in the project and contribute!

To give context: we are in the consortium that develops and maintains the EOTDL, where anyone can create, share and use training datasets for EO ML applications. We have adopted STAC as our core specification and already have worked with it (example, ml-dataset extension), and were thinking about developing a new STAC extension for ML-models. Nonetheless, it has seemed better and more viable to help and contribute to the extensions that already exist such as DLM. We have found it is up-to-date and quite complete, but we also have doubts about its generalization to any machine learning method, rather than deep learning, which is what we really need. There are some missing features and elements we'd need to align the extension with our approach, and we'll be glad to discuss it further.

So, we are aligned with the development of DLM and the substitution of ml-model (perhaps changing the name, to avoid confusion?), and will be glad to contribute. Please @rbavery reach us to start contributing!

cc. @juansensio (CTO EarthPulse)

fmigneault commented 9 months ago

@fmariv Good to hear more users are interested in the project. If you could provide feedback directly on the work @rbavery already started in https://github.com/crim-ca/dlm-extension/pull/2, that would help us understand the current irritants about DLM that should be adjusted to help generalization with other ML algorithms.

rbavery commented 9 months ago

ditto what @fmigneault said! You can comment directly in this markdown document if you prefer, or a code review on the PR would also be great.

I'm also down to discuss this extension on a video meet if it helps incorporate feedback quicker and advance this as a community standard. Feel free to book a 30-min meet on my calendar here: https://calendly.com/ryan-at-wherobots/30min

substitution of ml-model (perhaps changing the name, to avoid confusion?)

I'm open to changing the name. Currently we have named the extension the Machine Learning Model Extension in https://github.com/crim-ca/dlm-extension/pull/2 and we were thinking we would release it as version 2 in this repository once we have some examples ready and incorporate more community feedback.

rbavery commented 9 months ago

I met with @fmariv, and put together a roadmap for v2: https://github.com/crim-ca/dlm-extension/issues/7

I think once these items are complete this is good or close to good to be merged to this repo! Feel free to comment or add issues at https://github.com/crim-ca/dlm-extension/issues

as a reminder, this markdown document is the up to date doc of the schema and it is open for comment. you can ping me on github here or in the Cloud Native Geo slack channel ml-stac and I'll review and respond to comments.

rbavery commented 9 months ago

Hi all, we're close to wrapping up version 2 of this schema and an accompanying library to generate STAC metadata. Could I be made owner of this repo so that I can add @fmigneault and update other repo settings? Not sure who has the power to grant this!

Also, we will be giving a short presentation at the next STAC Community Meeting to introduce the new ML Model Extension if folks are interested in learning how to document their models and associate them with other STAC objects.

Join info below, everyone is welcome to come ask questions and give feedback. We might do some more focused session on the ML Model related extensions in the future if there's interest:

STAC Community Meetup Monday, March 11 · 8:00 – 9:00am Time zone: America/Los_Angeles Google Meet joining info Video call link: https://meet.google.com/gma-vujm-sbi Or dial: ‪(US) +1 252-986-3093‬ PIN: ‪785 785 181‬# More phone numbers: https://tel.meet/gma-vujm-sbi?pin=7110281050917

m-mohr commented 9 months ago

Upgraded you to Admin @rbavery

@PondiB The updates might be of interest to you.

PondiB commented 9 months ago

@PondiB The updates might be of interest to you.

Thanks for the tag. I am on vacation till April. I will try to attend the Meeting on 11th March.

fmigneault commented 8 months ago

I invite anyone working with ML and annotations in STAC to show interest to this: https://github.com/stac-utils/pystac/issues/1313

fmigneault commented 7 months ago

Hi everyone. The long-running PR (https://github.com/crim-ca/dlm-extension/pull/2) for the new Machine Learning Model (MLM) extension is now merged!

Multiple STAC Item examples (https://github.com/crim-ca/dlm-extension/tree/main/examples) are provided with validation against the MLM schema (https://github.com/crim-ca/dlm-extension/blob/main/json-schema/schema.json) while making use of other STAC extensions at the same time.

A pydantic+pystac compatible tool is available here: https://github.com/crim-ca/dlm-extension/tree/main/stac_model (see repo root for pyproject.toml, etc. for installation)

If you are interested in providing more examples (or getting precisions about provided examples), let me know though issues.

rbavery commented 7 months ago

I'll publish a release for https://pypi.org/project/stac-model and add you as a co-maintainer @fmigneault if that sounds good!

I can move everything from dlm-extension with a slightly updated README to reflect the new home of the extension. is now a good time to do that?

fmigneault commented 7 months ago

@rbavery Yes, you can release a version for stac-model and add me to the co-maintainers. For the move, I think a fork under stac-extensions would work? I think it is something to discuss during the next community meeting since we'll have to propose deprecating ml-model at the same time.

rbavery commented 7 months ago

Down to discuss!

We (at Wherobots) would rather have the stac-extensions org host the canonical repo for the extension. This way no one org is seen to own the maintenance of the extension and like other stac-extension repos all the issues and discussions happen within the org's version of the repo. This might make it clearer to potential contributors and users that this extension has similar open maintenance and contribution practices like other stac_extensions than if the repo was forked.

fmigneault commented 7 months ago

Relevant PR: #16

fmigneault commented 1 month ago

I think most items discussed here have been addressed by https://github.com/stac-extensions/mlm. If any remains, a specific issue can be opened to discuss it in more details.

stac-extensions / ml-model

Revisiting the ML Model extension #13