Open cceyda opened 1 year ago
I guess this info is available through the metrics. But sadly even though grafana says it can connect to the metrics port as a prometheus datasource it doesn't show any data... Would be nice if there was a guide & a grafana dashboard template that we can import. I don't know grafana that well so no idea what is wrong. launching triton with --metrics-port 8081
adding as source on grafana like below
says "Data source is working" when saving. But then shows nothing when queried 🤷♀️ I can see the text logs when I visit http://localhost:8081/metrics
The dashboard template is located at: https://github.com/triton-inference-server/server/blob/main/deploy/k8s-onprem/dashboard.json
I still can't see any data despite the "Data source is working" message. Is there required/recommended version of prometheus/grafana?
Ok so I didn't know I also had to start & setup the prometheus server myself! I thought the /metrics port was the prometheus server, the docs should be more clear on that. This guide helped me: https://blog.salesforceairesearch.com/benchmarking-tensorrt-inference-server/#a-minimalistic-guide-to-setting-up-the-inference-server:~:text=log%2Dverbose%3Dtrue-,(BONUS),-Step%204%3A%20Metrics
Ok I think this is still a valid request because the metrics reported are per model. I would like to see the metrics per instance of the model.Like so:
I0517 03:53:25.171743 27359 tensorrt.cc:334] model span_marker, instance span_marker_0_1, executing 11 requests, batch size 16
Is your feature request related to a problem? Please describe. How can I see the the total batch size the dynamic batching creates in the logs? I can see how many of the requests are grouped by dynamic batching when I grep the logs with
--log-verbose 2
like below. But I don't know the exact batch size it creates because every request I'm sending can have different batch size.ie: 4 requests could be 4 requests with batch size 1 => thus a total of 4 or 4 requests could be 4 requests with batch size 4 => thus a total of 16
This would help deciding on a
max_queue_delay_microseconds
Describe the solution you'd like Would be nice if this info was available easily grepable. (or a part of the metrics reported, haven't connected metrics dashboard yet so I don't know if it already is)
I0517 03:53:25.171743 27359 tensorrt.cc:334] model span_marker, instance span_marker_0_1, executing 11 requests, batch size 16
Describe alternatives you've considered If I wanted this info now I would have to parse the entire log row by row & sum the batch sizes reported inbetween each request, which I would rather not do.
Additional context using triton 23.04