[FR] [Roadmap] Inline examples of model flavor usage with MLflow

BenWilson2 commented 2 years ago

MLflow Roadmap Item

This is an MLflow Roadmap item that has been prioritized by the MLflow maintainers. We’ve identified this feature as a highly requested addition to the MLflow package based on community feedback. We're seeking a community contribution for the implementation of this feature and will enthusiastically support the development and review of a submitted PR for this.

Contribution Note

As with other roadmap items, there may be a desire for multiple contributors to work on an issue. While we don’t discourage collaboration, we strongly encourage that a primary contributor is assigned to roadmap issues to simplify the merging process. The items on the roadmap are of a high priority. Due to the wide-spread demand of roadmap features, we encourage potential contributors to only agree to take on the work of creating a PR, making changes, and ensuring that test coverage is adequately created for the feature if they are willing and able to see the implementation through to a merged state.

Feature scope

This roadmap feature’s complexity is classified as:

[X] good-first-issue: This feature is limited in complexity and effort required to implement.
[ ] simple: This feature does not require a large amount of effort to implement and / or is clear enough to not need a design discussion with maintainers.
[ ] involved: This feature will require a substantial amount of development effort but does not require an agreed-upon design from the maintainers. The feedback given during the PR phase may be involved and necessitate multiple iterations before approval. (Please bear with us as we collaborate with you to make a great contribution)
[ ] design-recommended: This is a substantial feature that should have a design document approved prior to working on an implementation (to save your time, not ours). After agreeing to work on this feature, a maintainer will be assigned to support you throughout the development process.

Proposal Summary

This meta-FR covers the conversion of model flavors documentation to be consistent with the new, more user-friendly design of Pmdarima](https://mlflow.org/docs/latest/models.html#pmdarima-pmdarima-experimental), Model Evaluation and Diviner.

If taking on one of the below listed flavors, please request assignment below and we will tag you to that flavor's implementation.

[X] * pyfunc #7867
[X] * crate (R function) #7582
[X] * h2o #8292
[X] * keras #6781
[X] * MLeap #7657
[X] * pytorch #7121
[X] * scikit-learn #6694
[X] * SparkML #7706
[X] * Tensorflow #8313
[X] * ONNX #7398
[X] * gluon #8403
[x] * XGBoost #7803
[X] * LightGBM #7865
[X] * CatBoost #7691
[x] * SpaCy
[X] * Fastai #7219
[X] * Statsmodels (linear model and timeseries model usage) #8394
[X] * Prophet #7719

Motivation

What is the use case for this feature?

Make it easier to understand common usage patterns for each of these officially supported libraries within MLflow

Why is this use case valuable to support for MLflow users in general?

Reduce the complexity of figuring out the API patterns by hunting in the repository and using trial and error.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[X] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Languages

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

BenWilson2 commented 2 years ago

For any questions, concerns, or clarification on implementing this issue, please ping @dbczumar

marijncv commented 2 years ago

@dbczumar I created #6694 that contributes to this FR. Let me know if you think it's going in the right direction, then I can try some of the other flavours afterwards!

rddefauw commented 1 year ago

@dbczumar I'd be happy to tackle fastai for this FR.

Rusteam commented 1 year ago

@dbczumar I can help out with ONNX

BenWilson2 commented 1 year ago

@Rusteam thank you for volunteering! Please tag us in the PR when you file it and let us know if you have any questions :)

agoyot commented 1 year ago

@dbczumar I can help for the crate R if needed

sniafas commented 1 year ago

@dbczumar Happy to start with a Tensorflow example

dbczumar commented 1 year ago

@sn8k2s @sniafas Thank you both so much! Those both sound great! Looking forward to your pull requests!

agoyot commented 1 year ago

@dbczumar Hello, I can help for the CatBoost example

JaynouOliver commented 1 year ago

@dbczumar Hey there, I would like to contribute here. Can you please assign me as a contributor for this issue? I would like to work on the TensorFlow example.

dbczumar commented 1 year ago

Hi @agoyot @JaynouOliver , that would be wonderful! Thank you in advance for your contributions!

JaynouOliver commented 1 year ago

thank you!

dipanjank commented 1 year ago

Hi @dbczumar I would like to contribute here. I can start with SparkML, perhaps?

dbczumar commented 1 year ago

Hi @dipanjank that sounds great! Thank you for your help!

dipanjank commented 1 year ago

Hi @dipanjank that sounds great! Thank you for your help!

@dbczumar Please see #7706

dipanjank commented 1 year ago

@dbczumar the SparkML example is merged :) I can pick up Prophet next?

dipanjank commented 1 year ago

@dbczumar please review https://github.com/mlflow/mlflow/pull/7719

dipanjank commented 1 year ago

@BenWilson2 @dbczumar I will pick up the Spacy Example next - it doesn't seem to be assigned to someone atm.

dipanjank commented 1 year ago

@dbczumar I'll pick up the sub-task for xgboost next.

BenWilson2 commented 1 year ago

@dipanjank Sounds great! Thank you for all of the contributions! 👍

dipanjank commented 1 year ago

@dipanjank Sounds great! Thank you for all of the contributions! 👍

It's great to be able to contribute to such a popular project :)

dipanjank commented 1 year ago

Will pick up LightGBM next :)

dipanjank commented 1 year ago

Will pick up tensorflow next!

ericvincent18 commented 1 year ago

Hi @dbczumar I'd be happy to contribute here, could I pick up h2o example?

krishnakalyan3 commented 1 year ago

Perhaps tensorflow example is not required as its already covered in https://github.com/mlflow/mlflow/pull/6781?. Maybe we can just add a note that with Tensorflow v2, Keras has tighter integration with tensorflow. @BenWilson2 what do you think?

dipanjank commented 1 year ago

Perhaps tensorflow example is not required as its already covered in #6781?. Maybe we can just add a note that with Tensorflow v2, Keras has tighter integration with tensorflow. @BenWilson2 what do you think?

I wanted to ask this as well. IMHO the only thing the "keras" example doesn't cover is the scenario where an mlflow user has hand-written a low-level training loop and wants to record some params / metric on every iteration, e.g.


@tf.function
def train_step(x, y, step):
    with tf.GradientTape() as tape:
        y_hat = tf.matmul(w, x)
        loss_value = loss_fn(y, y_hat)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    mlflow.log_metric(key="loss", value=loss_value, step=step) .
    return loss_value

Do we see this as a valuable addition?

BenWilson2 commented 1 year ago

@krishnakalyan3 @dipanjank I think a simplified example in line with what you're suggesting is ideal. We don't need to implement a custom CNN from scratch as an example, but a fairly simple integration within a training loop would be great for the TF flavor.

BenWilson2 commented 1 year ago

@ericvincent18 we'd be thrilled to have you work on the H2o example! Let us know when your PR is ready.

ericvincent18 commented 1 year ago

@BenWilson2 @dbczumar please review #8292 , thanks !

ericvincent18 commented 1 year ago

@BenWilson2 I can pickup statsmodels as well once h2o is merged

BenWilson2 commented 1 year ago

@ericvincent18 you most certainly may :) Keep in mind that the statsmodels example will be split into 2 parts (for the two separate "families of APIs" in statsmodels. I'd recommend doing one on some form of traditional regression statistical model and another on a timeseries model. That way we can demonstrate the two different paradigms in play within that library. Feel free to choose any base model types within those families (a statistical model and a timeseries model) that tickle your fancy.

ericvincent18 commented 1 year ago

@BenWilson2 Will do, thanks Ben!

ericvincent18 commented 1 year ago

@BenWilson2 #8394 ready for review :)

ericvincent18 commented 1 year ago

I'll pick up gluon as well.

mlflow / mlflow