pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.19k stars 857 forks source link

Metrics API return empty response #1961

Open pankajvshrma opened 1 year ago

pankajvshrma commented 1 year ago

🐛 Describe the bug

I have written a custom handler. After running Torchserve and registering the workflow, when I try curl http://127.0.0.1:8082/metrics it returns nothing.

Error logs

http://127.0.0.1:8082/metrics it returns nothing

Installation instructions

Install Torchserve from source. Not using docker.

Model Packaing

I use parallel workflow framework and packaged 4 video models and 3 audio models. Two handler files, one for audio and one for video.

config.properties

enable_envvars_config=true max_request_size=65535000 number_of_netty_threads=8 netty_client_threads=8 job_queue_size=1000 default_response_timeout=300 unregister_model_timeout=300 install_py_dep_per_model=true

inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 enable_metrics_api=true metrics_format=prometheus

Versions


Environment headers

Torchserve branch:

torchserve==0.6.0b20221029 torch-model-archiver==0.6.0b20221029

Python version: 3.8 (64-bit runtime) Python executable: /home/chingari/434/my_env/bin/python

Versions of relevant python libraries: captum==0.5.0 future==0.18.2 intel-extension-for-pytorch==1.12.300 numpy==1.23.4 nvgpu==0.9.0 psutil==5.9.3 pygit2==1.6.1 pylint==2.6.0 pytest==7.2.0 pytest-cov==4.0.0 pytest-mock==3.10.0 requests==2.28.1 requests-toolbelt==0.10.1 torch==1.12.0+cu113 torch-model-archiver==0.6.0b20221029 torch-workflow-archiver==0.2.4b20221029 torchaudio==0.12.0+cu113 torchserve==0.6.0b20221029 torchtext==0.13.0 torchvision==0.13.0+cu113 transformers==4.11.0 wheel==0.37.1 torch==1.12.0+cu113 torchtext==0.13.0 torchvision==0.13.0+cu113 torchaudio==0.12.0+cu113

Java Version:

OS: Ubuntu 20.04.4 LTS GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: N/A CMake version: N/A

Is CUDA available: Yes CUDA runtime version: 11.6.112 GPU models and configuration: GPU 0: NVIDIA A10G Nvidia driver version: 510.85.02 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.0

Repro instructions

torchserve --start \ --model-store tools/deployment/model_store \ --workflow-store tools/deployment/workflow_store \ --ncs \ --ts-config tools/deployment/config.properties \ --foreground

Logs

2022-11-09T07:57:21,740 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager... 2022-11-09T07:57:21,815 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.6.0 TS Home: /home/chingari/434/my_env/lib/python3.8/site-packages Current directory: /home/chingari/434/Video-Classification/mmaction2 Temp directory: /tmp Number of GPUs: 1 Number of CPUs: 32 Max heap size: 30688 M Python executable: /home/chingari/434/my_env/bin/python Config file: tools/deployment/config.properties Inference address: http://0.0.0.0:8080 Management address: http://0.0.0.0:8081 Metrics address: http://0.0.0.0:8082 Model Store: /home/chingari/434/Video-Classification/mmaction2/tools/deployment/model_store Initial Models: N/A Log dir: /home/chingari/434/Video-Classification/mmaction2/logs Metrics dir: /home/chingari/434/Video-Classification/mmaction2/logs Netty threads: 8 Netty client threads: 8 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 65535000 Limit Maximum Image Pixels: true Prefer direct buffer: false Allowed Urls: [file://.|http(s)?://.] Custom python dependency for model allowed: true Metrics report format: prometheus Enable metrics API: true Workflow Store: /home/chingari/434/Video-Classification/mmaction2/tools/deployment/workflow_store Model config: N/A 2022-11-09T07:57:21,815 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.6.0 TS Home: /home/chingari/434/my_env/lib/python3.8/site-packages Current directory: /home/chingari/434/Video-Classification/mmaction2 Temp directory: /tmp Number of GPUs: 1 Number of CPUs: 32 Max heap size: 30688 M Python executable: /home/chingari/434/my_env/bin/python Config file: tools/deployment/config.properties Inference address: http://0.0.0.0:8080 Management address: http://0.0.0.0:8081 Metrics address: http://0.0.0.0:8082 Model Store: /home/chingari/434/Video-Classification/mmaction2/tools/deployment/model_store Initial Models: N/A Log dir: /home/chingari/434/Video-Classification/mmaction2/logs Metrics dir: /home/chingari/434/Video-Classification/mmaction2/logs Netty threads: 8 Netty client threads: 8 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 65535000 Limit Maximum Image Pixels: true Prefer direct buffer: false Allowed Urls: [file://.|http(s)?://.] Custom python dependency for model allowed: true Metrics report format: prometheus Enable metrics API: true Workflow Store: /home/chingari/434/Video-Classification/mmaction2/tools/deployment/workflow_store Model config: N/A 2022-11-09T07:57:21,820 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... 2022-11-09T07:57:21,820 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... 2022-11-09T07:57:21,838 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2022-11-09T07:57:21,838 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2022-11-09T07:57:21,886 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080 2022-11-09T07:57:21,886 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080 2022-11-09T07:57:21,886 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2022-11-09T07:57:21,886 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2022-11-09T07:57:21,888 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081 2022-11-09T07:57:21,888 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081 2022-11-09T07:57:21,888 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2022-11-09T07:57:21,888 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2022-11-09T07:57:21,889 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082 2022-11-09T07:57:21,889 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082

Possible Solution

I tried to give API calls. Even after successful inference calls, still Metrics API response is empty, but http response is 200.

frankiedrake commented 1 year ago

Do you see metrics in your torchserve logs file (ts_log.log, ts_metrics.log)?

pankajvshrma commented 1 year ago

@frankiedrake yes I see metrics getting logged in torchserve logs file (ts_log.log, ts_metrics.log). Sample log in ts_metrics.log:

2022-11-09T10:33:36,163 - Requests2XX.Count:1|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667982096 2022-11-09T10:33:51,163 - Requests2XX.Count:1|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667982096 2022-11-09T10:34:06,163 - Requests2XX.Count:1|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667982096 2022-11-09T10:34:21,163 - Requests2XX.Count:1|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667982096 2022-11-09T10:34:35,383 - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - DiskAvailable.Gigabytes:247.98999786376953|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - DiskUsage.Gigabytes:285.09499740600586|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - DiskUtilization.Percent:53.5|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - GPUMemoryUtilization.Percent:47.659371200277924|#Level:Host,device_id:0|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - GPUMemoryUsed.Megabytes:10975|#Level:Host,device_id:0|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - GPUUtilization.Percent:0|#Level:Host,device_id:0|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - MemoryAvailable.Megabytes:111133.0078125|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - MemoryUsed.Megabytes:15022.95703125|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:35,384 - MemoryUtilization.Percent:12.8|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667990075 2022-11-09T10:34:36,163 - Requests2XX.Count:1|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667982096 2022-11-09T10:34:51,163 - Requests2XX.Count:1|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667982096 2022-11-09T10:35:06,163 - Requests2XX.Count:1|#Level:Host|#hostname:ip-10-12-138-251,timestamp:1667982096

frankiedrake commented 1 year ago

I was getting the empty response when I specified non-existing metric or when I was trying to get metrics when no inference API request was made. Maybe logs emit something weird when you're trying to get the metrics? Did you try to bind prometheus and see if metrics are availible there?

pankajvshrma commented 1 year ago

@frankiedrake yes I tried binding with Prometheus but still getting no output there. Also, I am not using any custom metrics. Even after successful API call I see empty response.

pankajvshrma commented 1 year ago

Is it a known problem with workflow based APIs?

maaquib commented 1 year ago

Hey @pankajvshrma Right now we only support 3 inference metrics on the prometheus metrics endpoint. Unless you run an inference the metrics will be empty (with a 200OK response). If you have a different expectation, let us know

pankajvshrma commented 1 year ago

Hey @maaquib /metrics endpoint is empty even after inference. I was wondering if this is an issue with workflows.

maaquib commented 1 year ago

@pankajvshrma Seems like a bug with workflow. Will look into this

pankajvshrma commented 1 year ago

@maaquib is there any update on this?

gasabr commented 1 year ago

I'm encountering the same exact situation in the same environment (0.6.0-cpu torch-serve): even after the model inference /metrics endpoint returns nothing. I can also provide a docker / helm to reproduce

hungtooc commented 1 year ago

I'm encountering the same issue

Fissium commented 1 year ago

I have the same issue using docker image 0.7.1-cpu. Any news?

Fissium commented 1 year ago

I found a solution: metrics are available if you make an inference using models (at least one of the models), not a workflow. E.g. curl http://127.0.0.1:8080/predictions/dog_breed_wf__dog_breed_classification -T path_to_image/img.jpg and curl http://127.0.0.1:8080/predictions/dog_breed_wf__cat_dog_classification -T path_to_image/img.jpg Then curl http://127.0.0.1:8082/metrics returns metrics.

pengxin233 commented 8 months ago

pytorch/torchserve:0.6.1-gpu I also encountered this problem. Even after calling the model, it still returned empty results.