Closed priyanshum-cashify closed 2 years ago
When requests get made to torchserve they are kept track of in a counter called ts_inference_requests_total
and they get placed in a queue for an amount of time called ts_queue_latency_microseconds
before they are finally handled by one of the available workers which will actually go ahead and run an inference in an amount of time called ts_inference_latency_microseconds
When requests get made to torchserve they are kept track of in a counter called
ts_inference_requests_total
and they get placed in a queue for an amount of time calledts_queue_latency_microseconds
before they are finally handled by one of the available workers which will actually go ahead and run an inference in an amount of time calledts_inference_latency_microseconds
How to get other metrics that are collected to logs, such as CPUUtilization
, MemoryUtilization
via metrics API, or, at least in Prometheus?
📚 The doc issue
On the page https://pytorch.org/serve/metrics_api.html# , there are references to the following :
We have gone through the document link above and have performed integration with Grafana.
However, we request further explanation of these three terminologies, and how to use these Metrics to monitor our Model. Your insight will be very helpful for our Team.
Thanks Priyanshu Mishra
Suggest a potential alternative/fix
No response