kserve / modelmesh-runtime-adapter

Unified runtime-adapter image of the sidecar containers which run in the modelmesh pods
Apache License 2.0
21 stars 59 forks source link

Triton RuntimeStatus.MethodInfos is missing ModelStreamInfer #80

Open Legion2 opened 9 months ago

Legion2 commented 9 months ago

Triton provides an extension to the standard gRPC inference api for streaming (inference.GRPCInferenceService/ModelStreamInfer), this extension is required to use vLLM backend with triton. However currently the triton runtime adapter does not advertise the existence of this gRPC method and trying to call it results in an error (inference.GRPCInferenceService/ModelStreamInfer: UNIMPLEMENTED: Method not found or not permitted: inference.GRPCInferenceService/ModelStreamInfer)

To resolve this issue, I think the ModelStreamInfer method must be added here: https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/model-mesh-triton-adapter/server/server.go#L267-L269

Legion2 commented 9 months ago

I have created a PR #81 and tested in our environment that the ModelStreamInfer requests work with the patch.