Inference timeout applies to other calls, including model registration call

aleecoveo commented 1 year ago

🐛 Describe the bug

According to the Management API docs, "response_timeout - If the model’s backend worker doesn’t respond with inference response within this timeout period, the worker will be deemed unresponsive and rebooted" but the timeout in fact seems to apply to any call, notably to the call to a backend worker to register the model, which means that if the model takes longer than response_timeout seconds to initialize, it cannot be served.

Error logs

Here's the log you get when you register a model that takes longer to load than response_timeout (set to 2 in this case):

torchserve-timeout-issue-app-1  | 2023-01-08T17:15:07,523 [DEBUG] W-9004-long-init-model_0.0.1 org.pytorch.serve.wlm.WorkerThread - W-9004-long-init-model_0.0.1 State change null -> WORKER_STARTED
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:07,523 [INFO ] W-9004-long-init-model_0.0.1 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9004
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:07,524 [INFO ] W-9004-long-init-model_0.0.1-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9004.
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:07,525 [INFO ] W-9004-long-init-model_0.0.1 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1673198107525
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:07,526 [INFO ] W-9004-long-init-model_0.0.1-stdout MODEL_LOG - model_name: long-init-model, batchSize: 1
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:07,533 [INFO ] W-9004-long-init-model_0.0.1-stdout MODEL_LOG - Initializing the handler...
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:09,526 [INFO ] W-9004-long-init-model_0.0.1 org.pytorch.serve.wlm.WorkerThread - Backend response time: 2000
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:09,527 [ERROR] W-9004-long-init-model_0.0.1 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:09,529 [ERROR] W-9004-long-init-model_0.0.1 org.pytorch.serve.wlm.WorkerThread - Backend worker error
torchserve-timeout-issue-app-1  | org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
torchserve-timeout-issue-app-1  |       at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:201) [model-server.jar:?]
torchserve-timeout-issue-app-1  |       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
torchserve-timeout-issue-app-1  |       at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
torchserve-timeout-issue-app-1  |       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
torchserve-timeout-issue-app-1  |       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
torchserve-timeout-issue-app-1  |       at java.lang.Thread.run(Thread.java:833) [?:?]
torchserve-timeout-issue-app-1  | 2023-01-08T17:15:09,552 [WARN ] W-9004-long-init-model_0.0.1 org.pytorch.serve.wlm.BatchAggregator - Load model failed: long-init-model, error: Worker died.

Installation instructions

Torchserve run using the docker image 15279ec970c4, which has version 0.7.0

Model Packaing

Model files are here: https://gist.github.com/aleecoveo/75edbb6b9ac368e84a29dcf0fe96e439

Archived with: torch-model-archiver --model-name long-init-model --version 0.0.1 --model-file dummy_model.py --serialized-file dummy_model.pt --handler ./handler_with_long_initialization

config.properties

No response

Versions

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.7.0b20221212
torch-model-archiver==0.7.0b20221212

Python version: 3.8 (64-bit runtime)
Python executable: /usr/bin/python

Versions of relevant python libraries:
captum==0.5.0
future==0.18.2
intel-extension-for-pytorch==1.13.0
numpy==1.23.5
nvgpu==0.9.0
psutil==5.9.4
pygit2==1.11.1
pylint==2.6.0
pytest==7.2.0
pytest-cov==4.0.0
pytest-mock==3.10.0
requests==2.28.1
requests-toolbelt==0.10.1
torch==1.13.0+cpu
torch-model-archiver==0.7.0b20221212
torch-workflow-archiver==0.2.6b20221212
torchaudio==0.13.0+cpu
torchserve==0.7.0b20221212
torchtext==0.14.0
torchvision==0.14.0+cpu
transformers==4.25.1
wheel==0.38.4
torch==1.13.0+cpu
torchtext==0.14.0
torchvision==0.14.0+cpu
torchaudio==0.13.0+cpu

Java Version:

OS: Ubuntu 20.04.5 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: N/A

Repro instructions

Archive model using files at https://gist.github.com/aleecoveo/75edbb6b9ac368e84a29dcf0fe96e439 and cd to the directory
Start torchserve server; I used the docker image with docker run -it -p 8080:8080 -p 8081:8081 -p 8082:8082 -p 7070:7070 -p 7071:7071 -v $(pwd):/home/model-server/model-store 15279ec970c4
Register the model with curl -X POST "http://localhost:8081/models?url=long-init-model.mar&initial_workers=1&response_timeout=2" The call fails with
```
{
"code": 500,
"type": "InternalServerException",
"message": "Failed to start workers for model long-init-model version: 0.0.1"
}
```

Possible Solution

No response

mreso commented 1 year ago

Hi,

thanks for filing the issue.

I see that this can be an issue. What I am not sure about is if this is a doc bug (remove "with inference response" to generalize on response) or if there is value for changing the behavior more general.

My understanding of the timeout is recovery from failure cases in rare occasions or edge cases. Are you seeing frequently dying workers and thus set a low timeout?

Would be great to hear more about the use case!

aleecoveo commented 1 year ago

Hello!

In fact what we were originally intending to do with the timeout doesn't work (somewhere along the line we misread the documentation, and somehow didn't realize it even while writing this issue!!! 🤦 ); in our case the service that calls torchserve has a timeout and we didn't want the backend to waste time processing a request that the client was no longer waiting for (while fresh requests sit idle in the queue), but obviously setting the timeout won't work for that, as we don't want the worker to reboot.

For cases like this, reverse proxies use a LIFO queue for requests (e.g. Skipper); does Torchserve use LIFO or FIFO? Is there anything else you would suggest to prevent the backend from processing requests that nobody is waiting for?

For the bug itself I agree that it would probably sufficient to change the documentation.

Thanks very much!

pytorch / serve