The /ping endpoint should return an error 5xx when the backend is Unhealthy

angarg12 commented 2 years ago

🚀 The feature

The /ping endpoint should return an error 5xx when the backend is Unhealthy.

Motivation, pitch

Currently the /ping health check returns HTTP OK always, whether the backend is Healthy, Unhealthy or Partially Healthy. This causes problems with some setups (e.g. AWS ALB) which rely on HTTP codes to determine the health of a server.

Alternatives

The status quo.

Additional context

This is the current behaviour.

Expected behaviour

The /ping endpoint returns an error 5xx when the backend is Unhealthy.

lxning commented 2 years ago

@angarg12 Torchserve includes frontend and backend. Frontend essentially is a http/grpc server which manages backend transparently to users; models are deployed on backend.

ping endpoint is used to monitor model serve frontend heartbeat. (eg. AWS SageMaker uses this endpoint to check Torchserve heartbeat).
endpoints model status and list models are used for check backend/model status.

angarg12 commented 2 years ago

@lxning when the backend crashes, the /ping endpoint returns "Unhealthy" and HTTP code 200. Is this expected?

If so, it doesn't align with my expectations. If /ping returns Unhealthy, I would expect a return code of 5xx.

lxning commented 2 years ago

@angarg12 ping endpoint is used to monitor model serve frontend heartbeat. (eg. AWS SageMaker uses this endpoint to check Torchserve heartbeat), not for backend. "Backend crash" do not mean "frontend(ie. model server) is unhealthy.

Backend purely runs customer model. Frontend will automatically retry to create a new worker(ie. backend process) if a worker (ie. backend process) dies. That's why endpoints model status and list models are provided for customer to check backend/model status.

angarg12 commented 2 years ago

Just to double check that I understand correctly and we are on the same page, I have created a minimal example of the issue that we are experiencing.

This Dockerfile reproduces the issue

FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.10.2-cpu-py38-ubuntu20.04-e3

RUN wget https://download.pytorch.org/models/densenet161-8d451a50.pth
RUN echo "raise Exception('boom')" > model_handler.py
RUN echo "inference_address=http://0.0.0.0:8080" >> ts.config
RUN torch-model-archiver --model-name densenet161 --version 1.0 --serialized-file densenet161-8d451a50.pth --handler model_handler --export-path /opt/ml/model/

CMD ["torchserve --start --model-store /opt/ml/model --ts-config ts.config --models densenet161=densenet161.mar"]

Running this container starts a torchserve instance where backend workers die

2022-10-20T21:52:41,536 [DEBUG] W-9013-densenet161_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.

The ping endpoint returns Unhealthy, but the HTTP code is 200.

curl -vvvv localhost:8080/ping
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /ping HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-type: application/json
< x-request-id: b3ab20d1-8c16-4ec3-95e9-bae10e5aabeb
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 28
< connection: keep-alive
< 
{
  "status": "Unhealthy"
}

This mimics previous issues where our workers died, but the ALB was still serving traffic to the affected machines since the /ping endpoint returns code 200.

Could you please confirm that this behavior is intended?

Thanks.

lxning commented 2 years ago

@angarg12 "ping" response code indicates the model serve alive status; the response message "unhealthy" does not mean the model sever should be removed from ALB because TorchServe supports model isolation and model management transparency. Let's see the following 2 use cases:

One model with 4 workers deployed on TorchServe. One worker dies due to one error in the input data. ALB isolates this TorchServe node if "ping" returns 5xx. In fact, this is unnecessary b/c TorchServe backend can be self recovered immediately. A new worker will be created.
20 models deployed on TorchServe. One model has some issue. ALB isolates this TorchServe node if "ping" returns 5xx. In fact, this is unnecessary b/c the other 19 models can still run correctly.

TorchServe feature "backend worker isolation" is very useful for production stability and low operational cost.

You can also customize "ping" endpoint as AWS SageMaker" in your use case by using TorchServe feature "plugin". Here is the example of the SageMaker ping endpoint.

lxning commented 2 years ago

@angarg12 According to @maaquib, you also chatted with him internally and agreed that there is no issue with ping endpoint. So I close this ticket. Please feel free to reopen this ticket if it is needed.

pytorch / serve