pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.23k stars 864 forks source link

Metric API - Integration with Prometheus & Grafana #1946

Closed priyanshum-cashify closed 2 years ago

priyanshum-cashify commented 2 years ago

📚 The doc issue

On the page https://pytorch.org/serve/metrics_api.html# , there are references to the following :

  1. ts_inference_latency_microseconds
  2. ts_queue_latency_microseconds
  3. ts_inference_requests_total

We have gone through the document link above and have performed integration with Grafana.

However, we request further explanation of these three terminologies, and how to use these Metrics to monitor our Model. Your insight will be very helpful for our Team.

Thanks Priyanshu Mishra

Suggest a potential alternative/fix

No response

msaroufim commented 2 years ago

When requests get made to torchserve they are kept track of in a counter called ts_inference_requests_total and they get placed in a queue for an amount of time called ts_queue_latency_microseconds before they are finally handled by one of the available workers which will actually go ahead and run an inference in an amount of time called ts_inference_latency_microseconds

frankiedrake commented 2 years ago

When requests get made to torchserve they are kept track of in a counter called ts_inference_requests_total and they get placed in a queue for an amount of time called ts_queue_latency_microseconds before they are finally handled by one of the available workers which will actually go ahead and run an inference in an amount of time called ts_inference_latency_microseconds

How to get other metrics that are collected to logs, such as CPUUtilization, MemoryUtilization via metrics API, or, at least in Prometheus?

maaquib commented 2 years ago

@frankiedrake We are working on unifying the metrics (statsd in log and prometheus endpoint). Please follow this RFC for tracking the work