Native Support for KServe Open Inference REST/gRPC Protocol

yuzisun commented 1 year ago

🚀 The feature

KServe now has rebranded the v2 inference protocol to open inference protocol specification, can we implement OIP natively in torchserve like other model servers such as Triton, MLServer, OpenVino, AMD Inference Serve ?

Motivation, pitch

Currently torchserve utilizes the KServe python server in front of the TorchServe Netty server to adapt to the KServe v1, v2 REST protocol. However, I think this extra layer provides minimal value and cause numerous issues for maintenance and performance. The KServe python SDK is primarily designed for native python inference runtimes when user wants to implement arbitrary inference code with pre/post processing. Similarly, TorchServe provides comparable custom handlers. Therefore, there is no good reason why we need both and route all the kserve inference requests through kserve python server -> Netty -> torchserve python worker.

Alternatives

No response

Additional context

Can we remove the kserve python wrapper all together?
Seems like we are able to send the kserve inference request directly to Netty server as v1/v2 REST requests are handled here
for gRPC I guess we need to implement the OIP grpc specification natively here

yuzisun commented 1 year ago

@msaroufim Can we setup a call with you to discuss this ?

msaroufim commented 1 year ago

Hi @yuzisun sure! I'd be happy to, just forwarded this to the core team lemme get back to you with a few times that work

Might be easiest to email me its marksaroufim@meta.com

gavrissh commented 1 year ago

Quick suggestion - During my experiments I noticed that in torchserve with KServe v1, v2 REST protocol, we cannot use dynamic batching done by the Netty server. This is causing a performance difference as well compared to using raw inputs. Batching support would be ideal. Thanks!

yuzisun commented 1 year ago

Make sense, the goal of this issue is to totally remove the kserve wrapper and implement OIP natively with TorchServe. @gavrishp I think it is possible to just disable the kserve wrapper and send requests directly to Netty server using KServe v1, v2 REST protocol, can you try that ?

pytorch / serve