rstudio / vetiver-python

Version, share, deploy, and monitor models.
https://rstudio.github.io/vetiver-python/stable/
MIT License
59 stars 17 forks source link

Compare multiple models and monitor their stage #188

Open GitHunter0 opened 1 year ago

GitHunter0 commented 1 year ago

Hey folks, vetiver is a pretty nice and straightforward tool to use.

The only important thing I'm missing is the ability to compare multiple models and monitor the stage (e.g. 'production') of each model (as MLflow does). Do you have plans to add that or it is beyond the scope of vetiver?

And just a side question, in VetiverAPI() is it possible to change to a custom endpoint instead of the default 'predict'?

Thank you

isabelizimm commented 12 months ago

Hey there--thanks @GitHunter0, these are fantastic questions! Model comparison is something that we have thought about, but are definitely interested in hearing what users would want! What you could do now is have a function run vetiver.compute_metrics for each model and differentiate between models by adding a column of the model name. Is that close to what you are thinking for model comparison?

The default /predict endpoint is created by running the vetiver_post method on VetiverAPI() creation using handler_predict on a VetiverModel() object. If you would like a different named endpoint, you could do something like

api = VetiverAPI(my_model)

api.vetiver_post(my_model.handler_predict, "myendpointname", check_prototype=True)

to add another endpoint with the same functionality. Hope that helps!

GitHunter0 commented 11 months ago

Hey @isabelizimm , thanks for your thoughtful feedback.

vetiver.compute_metrics for each model and differentiate between models by adding a column of the model name. Is that close to what you are thinking for model comparison?

It is something among those lines but in a more streamlined and detailed fashion. I'm posting two MLFlow UI screens as example: image

image

api = VetiverAPI(my_model)

api.vetiver_post(my_model.handler_predict, "myendpointname", check_prototype=True)

That worked, I appreciate it.

juliasilge commented 11 months ago

@GitHunter0 one thing we've noticed when talking to users is that almost everyone has pretty high customization needs when it comes to model monitoring, so we want to take a "code first" approach to monitoring by creating modular functions (like vetiver.compute_metrics that @isabelizimm mentioned) plus some templates (using Quarto).

You may have noticed the Model Card template, and the R side of vetiver does have a monitoring dashboard template (see it rendered and published). These kinds of templates are definitely in scope for vetiver, and our expectation would be that someone like you would use them as a jumping off point for their (individual, customized) needs. Would definitely love to hear more about what you are wanting to do!

GitHunter0 commented 11 months ago

Hey @juliasilge , sorry for the delay, thanks very much for the detailed follow up.

  • Are you interested in more of a report or more of a dashboard?

More of a dashboard, because the needs of my reports are too specific to be included in a general package.

  • What would you be looking for in terms of comparing multiple models? A table with statistical metrics? Or more of a way to keep track of all your deployed models (sometimes I hear this called a "dashboard of dashboards", i.e. mostly about linking to individual model information)

Actually, I would like all of that, they are all helpful features in my opinion, I believe MLflow is a good reference in this regard.

And last, It would be nice to have a way to deploy multiple models at once, each one to a specified endpoint.

I believe with those additional features, vetiver will be used by a much larger audience, especially considering the quality and design of RStudio packages, the products you folks deliver are the best.

juliasilge commented 11 months ago

Awesome; thank you so much for those details. We're waiting on a flexdashboard-style format for Quarto but in the meantime, we could work on initial reporting examples for model monitoring, comparing multiple models, etc.