schuderer / mllaunchpad

Deploy Machine Learning Solutions with Ease
Apache License 2.0
12 stars 5 forks source link

How does ML Launchpad differ from Metaflow #58

Open schuderer opened 4 years ago

schuderer commented 4 years ago

Netflix' Metaflow has been released to the public on Dec 3rd 2019. It sounds like Metaflow and ML Launchpad are very similar. What are the similarities and what are the differences?

schuderer commented 4 years ago

TL;DR: While Metaflow's step/flow/graph interface helps you to create generic data science pipelines, ML Launchpad focuses on facilitating the machine learning train/test/predict life cycle and to expose prediction as web service.

Metaflow has a very, very similar motivation to ML Launchpad. In fact, Metaflow's problem statement could be used almost unaltered as a motivation for ML Launchpad. They even use a "...with ease" slogan, just as we are! 😄 👍

We are really stoked that there are more and more people who get that making data-sciency stuff work in production is currently harder than it should be, and having a model work in an ad-hoc development environment is still a long ways from using it in a robust and maintainable production setting. Also, they understood that limiting yourself to a small number of whitelisted tools or libraries does not cut it if you want to stay current. It's really awesome that Netflix chose to release their framework, and we are looking forward to trying it out!

Regarding differences, we think that ML Launchpad has a different focus than Metaflow:

☝️ While Metaflow's step/flow/graph interface helps you to create generic data science pipelines, ML Launchpad focuses on facilitating the machine learning train/test/predict life cycle and to expose prediction as web service.

Everything that you can do in ML Launchpad, you can do in Metaflow as well. Except -- then you must solve the model-store, data-abstraction, out-of-code-configurability and web API parts yourself if you want them. The length of this list spans almost all of the features of ML Launchpad 😁, so the thing left in common, that you have both in Metaflow as well as in ML Launchpad, is a coding base class/interface that helps you to (more or less, see footnote) separate your model's code from IO and stuff going on around it.

Metaflow looks awesome. There are a lot of things in Metaflow that we admire. It is definitely something to check out if your data-preparation steps are complex enough to warrant breaking them up into separate resumable and inspectable steps. If you are already using AWS, it's a no-brainer -- try it out! The ability to debug your flow's steps in Jupyter notebook is simply a thing of beauty. 😍 And we are probably going to ...borrow... @retry as a config setting and apply some of its ideas for improving the development experience and debuggability outside of full-featured IDEs like PyCharm, particularly for Jupyter notebooks and also in Spyder.

footnote In ML Launchpad's philosophy, you would count e.g. the code referring to AWS S3 paths as stuff you would want to hide behind config, and not really count config-in-model-code (like timeouts) as "proper" config, because these settings cannot be swapped out easily for running the code in another environment without code changes. We find it useful to be able to swap configurations for different environments (including getting the data from different sources) without having to adapt our code. It is clear that when using AWS S3, you are able to swap the environments without the code noticing, but we are not willing to make that assumption that this kind of backend flexibility is always available, hence ML Launchpad's file-configured data source abstraction. But as I said, this is not due to one being wrong and the other being right, it's due to a different focus.
original notes Metaflow has been published on 3 December 2019 Similarities: Almost identical philosophy as ML Launchpad. Data scientists should focus on model, feature building code, not infrastructure. Similarities in technical implementation (base class, pickling) Use-any-python-libraries-you-like philosophy Ease-of-use before (premature) performance optimizations. Logging. Extensive documentation. Differences: Made for use with AWS (can maybe made to work without) Philosophy of code-only (no GUI or similar planned) In Metaflow DS is expected to solve everything with code (data specifics, resources, error handling), ML Launchpad solves these kinds of environment-specifics through config. Metaflow uses flow/step/graph-metaphor (data pass-down/join/merge/…) while ML Launchpad uses familiar train/test/predict Uses an extensive object hierarchy (Metaflow->Flow->Run->Step->Task->DataArtifact) No R support yet Hard data references in data scientist’s code (some access-level abstraction for AWS S3) Can resume steps of a larger workflow (if implemented as separate steps by DS) Data scientist handles when to save/load models in code (interplay of graph-metaphor and custom code) No API support in open-sourced version