[Investigation] PyTorch Model Serving Solution

brightcoder01 commented 4 years ago

SQLFlow extends the SQL syntax to describe the end-to-end machine learning pipeline. The end-to-end solution includes the model serving. The data transformation logic is consistent between training and serving stage. The PyTorch preprocessing is concluded in #2276 .

TorchServe

TorchServe is the official serving framework from PyTorch Repo. The key component representing the serving logic is TorchServe handlers. All the handlers are Python classes. The handler class contains the methods preprocess, inference and postprocess. TorchServe provides some built-in handlers, and we can customize and contribute the handlers of our own.

Questions:

Is there any performance issue since all the handlers are written in python? Need performance testing/profiling.

LibTorch

Convert PyTorch model to TorchScript using torch.jit.trace or torch.jit.ScriptModule. And then we can load the TorchScript using LibTorch (C++). Build our own ModelServer with LibTorch. Besides the model inference inside LibTorch, we need add preprocessing, postprocessing, RPC service, model auto update, model instance management and so on.

Question:

The TorchScript only contains the main model structure. It doesn't contain the preprocessing and postprocessing logic.
Do we need build our model server with LibTorch from scratch? The server need contains preprocess, postprocess, RPC server and other features mentioned above.

ONNX

Convert PyTorch model to ONNX format using the API torch.onnx.export(torch_model, ...). And then we can serve the ONNX model using ONNXRuntime.

Preprocessing: ONNXRuntime provides featurizers_ops and we can use it in serving stage. Can we leverage them in training stage? - Proposal: We can use the featurizers_ops as a separate package to process the data before feeding it into the model. At the stage of exporting model for serving, we can convert it to ONNX format (featurizers_ops + ONNX ops in Model). Issue to be filed to confirm with ONNX team about using featurizers_ops as a separate package.

Question:

The ONNX implements the common used operators but doesn't guarantee to cover all the OPs in PyTorch.

workingloong commented 4 years ago

There is an torchServe introduction given by AWS, "TorchServe delivers lightweight serving with low latency, so you can deploy your models for high performance inference." But I don't find any experiment about the low latency.

brightcoder01 commented 4 years ago

Opened questions on PyTorch Forum to track: What’s the official high performance serving solution for PyTorch How to keep consistency for data preprocessing between training and serving

brightcoder01 commented 4 years ago

The serving solution using LibTorch is preferred. It's more PyTorch native.

sql-machine-learning / sqlflow