pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.05k stars 821 forks source link

`\ready` API for Kubernetes probe to know when TorchServe backend is ready to receive traffic #3047

Open agunapal opened 3 months ago

agunapal commented 3 months ago

🚀 The feature

This feature would add an API so that Kubernetes probe can be used to know when to start sending traffic.

/ready will return 200 when all the models specified in config.properties has at least 1 backend worker ready to receive traffic.

This would make it simpler for customers to use TorchServe in a kubernetes deployment with multi model endpoints scenario

Motivation, pitch

For Multi-Model-Endpoint Use-case with Kubernetes, consider config.properties has the following models

models={\
  "noop": {\
    "1.0": {\
        "defaultVersion": true,\
        "marName": "noop.mar",\
        "minWorkers": 1,\
        "maxWorkers": 1,\
        "batchSize": 4,\
        "maxBatchDelay": 100,\
        "responseTimeout": 120\
    }\
  },\
  "vgg16": {\
    "1.0": {\
        "defaultVersion": true,\
        "marName": "vgg16.mar",\
        "minWorkers": 1,\
        "maxWorkers": 4,\
        "batchSize": 8,\
        "maxBatchDelay": 100,\
        "responseTimeout": 120\
    }\
  }\
}

Today, one can use the /ping API to know when TorchServe is up. But this is for the frontend only. Workers with multiple models will take additional time to come up.

Alternatives

One can write a script to use the /describe API for each of the models to track when each have at least 1 backend worker and then declare the pod ready.

Additional context

No response

lxning commented 3 months ago

This PR is working on adding health APIs for model server and model level.