I'm running TorchServe in WSL2. There are three issues with the metrics:
Even if metrics_config parameter in ts.config points to non-existing file everything works without any problems. It looks like the parameter doesn't work in my case
Even if I comment/remove some metrics (ts_metrics or model_metrics) I can see these metrics in the ts_metrics.log or model_metrics.log
I can't get some of the ts metrics via metrics API. There are only ts_queue_latency_microseconds and ts_inference_requests_total metrics
# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="57984a8f-c19a-4c93-b9cd-cc7eb8b1fa55",model_name="doc_model",model_version="default",} 4925586.4
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="57984a8f-c19a-4c93-b9cd-cc7eb8b1fa55",model_name="doc_model",model_version="default",} 3.0
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="57984a8f-c19a-4c93-b9cd-cc7eb8b1fa55",model_name="doc_model",model_version="default",} 291.4
UPD:
I've updated torchserve and torch-model-archiver to 0.9.0 and now I can see more metrics from metrics.yaml:
# HELP MemoryUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE MemoryUtilization gauge
MemoryUtilization{Level="Host",Hostname="msi",} 15.1
# HELP DiskUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE DiskUtilization gauge
DiskUtilization{Level="Host",Hostname="msi",} 84.1
# HELP GPUMemoryUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE GPUMemoryUtilization gauge
GPUMemoryUtilization{Level="Host",DeviceId="0",Hostname="msi",} 21.27685546875
# HELP DiskUsage Torchserve prometheus gauge metric with unit: Gigabytes
# TYPE DiskUsage gauge
DiskUsage{Level="Host",Hostname="msi",} 200.3065414428711
# HELP NameOfHistogramMetric Torchserve prometheus histogram metric with unit: ms
# TYPE NameOfHistogramMetric histogram
# HELP ts_inference_requests_total Torchserve prometheus counter metric with unit: Count
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{model_name="doc_model",model_version="default",hostname="msi",} 4.0
# HELP MemoryUsed Torchserve prometheus gauge metric with unit: Megabytes
# TYPE MemoryUsed gauge
MemoryUsed{Level="Host",Hostname="msi",} 3460.9609375
# HELP WorkerThreadTime Torchserve prometheus gauge metric with unit: Milliseconds
# TYPE WorkerThreadTime gauge
WorkerThreadTime{Level="Host",Hostname="msi",} 3.0
# HELP WorkerLoadTime Torchserve prometheus gauge metric with unit: Milliseconds
# TYPE WorkerLoadTime gauge
WorkerLoadTime{WorkerName="W-9000-doc_model_1.0",Level="Host",Hostname="msi",} 7894.0
# HELP ts_inference_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{model_name="doc_model",model_version="default",hostname="msi",} 1.66192689E7
# HELP DiskAvailable Torchserve prometheus gauge metric with unit: Gigabytes
# TYPE DiskAvailable gauge
DiskAvailable{Level="Host",Hostname="msi",} 37.860321044921875
# HELP GPUMemoryUsed Torchserve prometheus gauge metric with unit: Megabytes
# TYPE GPUMemoryUsed gauge
GPUMemoryUsed{Level="Host",DeviceId="0",Hostname="msi",} 1743.0
# HELP PredictionTime Torchserve prometheus gauge metric with unit: ms
# TYPE PredictionTime gauge
PredictionTime{ModelName="doc_model",Level="Model",Hostname="msi",} 2738.21
# HELP MemoryAvailable Torchserve prometheus gauge metric with unit: Megabytes
# TYPE MemoryAvailable gauge
MemoryAvailable{Level="Host",Hostname="msi",} 21698.54296875
# HELP CPUUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE CPUUtilization gauge
CPUUtilization{Level="Host",Hostname="msi",} 0.0
# HELP QueueTime Torchserve prometheus gauge metric with unit: Milliseconds
# TYPE QueueTime gauge
QueueTime{Level="Host",Hostname="msi",} 0.0
# HELP HandlerTime Torchserve prometheus gauge metric with unit: ms
# TYPE HandlerTime gauge
# HELP Requests4XX Torchserve prometheus counter metric with unit: Count
# TYPE Requests4XX counter
# HELP ts_queue_latency_microseconds Torchserve prometheus counter metric with unit: Microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{model_name="doc_model",model_version="default",hostname="msi",} 6382220.600000001
# HELP GPUUtilization Torchserve prometheus gauge metric with unit: Percent
# TYPE GPUUtilization gauge
GPUUtilization{Level="Host",DeviceId="0",Hostname="msi",} 0.0
I'm running TorchServe in WSL2. There are three issues with the metrics:
metrics_config
parameter ints.config
points to non-existing file everything works without any problems. It looks like the parameter doesn't work in my casets_metrics
ormodel_metrics
) I can see these metrics in thets_metrics.log
ormodel_metrics.log
ts_queue_latency_microseconds
andts_inference_requests_total
metricsts.config
:metrics.yaml
:metrics API output:
Here is how I build mar:
And how I start TS:
Versions:
UPD: I've updated torchserve and torch-model-archiver to 0.9.0 and now I can see more metrics from metrics.yaml: