triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.12k stars 1.46k forks source link

dynamic batching log created batch size #5802

Open cceyda opened 1 year ago

cceyda commented 1 year ago

Is your feature request related to a problem? Please describe. How can I see the the total batch size the dynamic batching creates in the logs? I can see how many of the requests are grouped by dynamic batching when I grep the logs with --log-verbose 2 like below. But I don't know the exact batch size it creates because every request I'm sending can have different batch size.

I0517 03:53:25.171743 27359 tensorrt.cc:334] model span_marker, instance span_marker_0_1, executing 11 requests
I0517 03:53:25.175241 27359 tensorrt.cc:334] model span_marker, instance span_marker_0_2, executing 4 requests
I0517 03:53:25.229200 27359 tensorrt.cc:334] model span_marker, instance span_marker_0_3, executing 3 requests

ie: 4 requests could be 4 requests with batch size 1 => thus a total of 4 or 4 requests could be 4 requests with batch size 4 => thus a total of 16

This would help deciding on a max_queue_delay_microseconds

Describe the solution you'd like Would be nice if this info was available easily grepable. (or a part of the metrics reported, haven't connected metrics dashboard yet so I don't know if it already is) I0517 03:53:25.171743 27359 tensorrt.cc:334] model span_marker, instance span_marker_0_1, executing 11 requests, batch size 16

Describe alternatives you've considered If I wanted this info now I would have to parse the entire log row by row & sum the batch sizes reported inbetween each request, which I would rather not do.

Additional context using triton 23.04

cceyda commented 1 year ago

I guess this info is available through the metrics. But sadly even though grafana says it can connect to the metrics port as a prometheus datasource it doesn't show any data... Would be nice if there was a guide & a grafana dashboard template that we can import. I don't know grafana that well so no idea what is wrong. launching triton with --metrics-port 8081 adding as source on grafana like below

image

says "Data source is working" when saving. But then shows nothing when queried 🤷‍♀️ I can see the text logs when I visit http://localhost:8081/metrics

yeahdongcn commented 1 year ago

The dashboard template is located at: https://github.com/triton-inference-server/server/blob/main/deploy/k8s-onprem/dashboard.json

cceyda commented 1 year ago

I still can't see any data despite the "Data source is working" message. Is there required/recommended version of prometheus/grafana?

cceyda commented 1 year ago

Ok so I didn't know I also had to start & setup the prometheus server myself! I thought the /metrics port was the prometheus server, the docs should be more clear on that. This guide helped me: https://blog.salesforceairesearch.com/benchmarking-tensorrt-inference-server/#a-minimalistic-guide-to-setting-up-the-inference-server:~:text=log%2Dverbose%3Dtrue-,(BONUS),-Step%204%3A%20Metrics

cceyda commented 1 year ago

Ok I think this is still a valid request because the metrics reported are per model. I would like to see the metrics per instance of the model.Like so: I0517 03:53:25.171743 27359 tensorrt.cc:334] model span_marker, instance span_marker_0_1, executing 11 requests, batch size 16