Reusing pipeline elements in a served model scenario

turn1a commented 3 years ago

Dear Kedro crew,

We are working on an extended example using Titanic dataset that is going to showcase a reasonably sophisticated, end-to-end Machine Learning project utilising best Software Engineering, Machine Learning and Kedro practices. We are trying to figure out the best way to serve a model for an online inference that is not only making predictions but is also performing some preprocessing steps on the data being sent to the model.

Here’s some background information regarding our architecture.

We have developed three modular pipelines:

data_engineering
feature_engineering
modelling Each of these pipelines can be created in one of two modes:
training
prediction

The idea is to share all required training nodes of the training pipeline with the prediction pipeline. We are using predict=True argument to create_pipeline to inform that we want specific training nodes to be included/excluded in prediction pipeline (for example we leave out fit_imputer but include predict_imputer). We recently figured out that a more appropriate way to do this would be using tags but didn’t manage to refactor our code yet.

At the end of the modelling pipeline, we output a model. Our prediction pipeline reuses some training nodes and works very well for batch inference, but we would also like to serve this model for an online inference in a Docker container. For this purpose, we are using MLflow’s pyfunc model wrapping LightGBM, but the model alone is not enough: we need data that goes into the model to go through the steps in data_engineering and feature_engineering.

Here we are stuck, and we don’t know what would be the best practice. The possibilities we thought about are:

We include our Kedro Python package as a dependency of MLflow model wrapper, import all required nodes into it as pure functions (not nodes) and recreate the steps of our pipelines. This approach has a considerable drawback that we repeat what’s already been specified using Kedro pipelines. On top of that, we don’t have access to the main configuration which is not included in the Python package so we would need to extract the datasets and parameters from the context.
Once again, this approach involves including the Kedro Python package but this time we use the Kedro pipelines and a runner. In such a way, we don’t have to worry about pipeline recreation, but we still lack the configuration.
In this scenario, we are not using MLflow Models at all, but we develop our own API, which would utilise the whole Kedro project (including the data catalog, the configuration and other elements). This could be achieved using a kedro plugin that would provide something akin to kedro serve command. The reasons for that would be:
- We have more control over the serving application. Currently, there is no way to write custom endpoints or responses from MLflow pyfunc model. It would be useful in cases like running different pipelines or using different models, handling validation errors (we are in the middle of developing a kedro-pandera plugin for automatic data validation that returns valuable debugging information) and more.
- Be able to provide middlewares/hooks which could log input data and predictions, monitor drift and feed those into Kedro pipelines.
- Handle authentication.

Considering the third option, we imagined that it could be automated (with a plugin), and the whole web server could be generated/filled with project pipelines/hooks/other components. It could be packed into a container using kedro-docker and easily deployed after that. Right now, the third option is a simple idea, and we didn’t dive into the details as we wanted to ask for your advice first.

I hope we just missed something and that there is a kedro way/solution to such a scenario.

Galileo-Galilei commented 3 years ago

Hello @kaemo,

@takikadiri and I are going to meet @yetudada @DmitriiDeriabinQB and @laisbsc this week as part of their feedback program. This very question will arise in the discussion as one of our "hottest " current problem.

Our current solution consists in a mix of 2. and 3. approach:

We use the PipelineML class of the kedro-mlflow plugin to store the pipeline and its artifacts in mlflow exactly as descibed in your item 2, as part of the discussion with one of your team member @akruszewski , see https://github.com/Galileo-Galilei/kedro-mlflow/issues/16.
When we need to serve a model through an API rather than a batch, we use an "encapsulating API" which meets our IT department standards (developed in JAVA rather than python , handle authentification, use custom endpoints for swagger or respect naming conventions...). This API is only an intermediary between our API and the app which calls it: it does not contain a single code line of business logic. This introduces some useless overhead, but latency has never been a huge concern for us. Our API with highest frequency deals with several mails by second, and the API call is far from being the slowest part of the whole process so we do concentrate our efforts on other process steps to improve its speed.

We are thinking about either:

forking mlflow to make their serve command more flexible
creating a plugin which implements the functionnalities you are describing above
evntually opening an issue on mlflow to suggest design decision to make their serve command more customisable, but I am not sure that they will be able to handle such an important feature request in a short amount of time.

but both come with huge maintenance costs on our side, and we have decided to stick to our current "encapsulating API process" until it hits its limits and becomes untractable for a given project.

I guess kedro-server and universal deployer will adress some of these concerns but AFAIK there is no kedro way to serve a pipeline right now. Maybe one member of the kedro team will more concrete elements on this?

limdauto commented 3 years ago

Hi @kaemo and @Galileo-Galilei, here would be my approach:

As of now, model serving is outside of the scope of Kedro. After your training pipeline produces a model, you should be able to use whichever tool best suited for your infrastructure to serve your model, including mlflow, tensorflow serving, etc.
Your prediction pipeline can simply contain the data_engineering + feature_engineering pipelines plus one last node that sends the data to the model's API over HTTP (for example). You can run this with kedro run --pipeline=prediction

Let me know what you think of this approach. cc @yetudada @DmitriiDeriabinQB

DmitriiDeriabinQB commented 3 years ago

I think model scoring naturally falls into that "model deployment" epic, which we did identified, but not yet formalised into anything concrete. Kedro in its current state focuses primarily on batch processing, therefore online inference (especially dependant on some feature engineering steps) is out of its scope.

The approach described in option 3 in the original post makes general sense to me, however, understandably, it requires a considerable effort to integrate all moving parts together since Kedro doesn't have anything like "Kedro Server" publicly supported.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

kedro-org / kedro

Reusing pipeline elements in a served model scenario #464