triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Apache License 2.0
419 stars 74 forks source link

Understanding GPU utilization #870

Open siretru opened 4 months ago

siretru commented 4 months ago

I'm having trouble interpreting some of the results...

After an Automatic Brute Search analysis, when I analyse the result_summary, I look at the Avegrage GPU Utilization.

How is this value determined? Is it in relation to the number of SMs (Stream MultiProcessors) used? Is it with dcmg or nvidia-smi? We know that it's quite complex to get a reliable measure of GPU usage (when using tools like Nvidia Nsight in particular), so I'd like to check the relevance of this metric.

What is the objective that is maximised in the Automatic Brute Search? Is it throughput?

My main question is : I'm trying to understand why, for a given model, when the ideal model configuration is reached, my GPU is only being used at around 30%? What is the limiting factor (i.e. why can't we use more of the GPU to increase throughput)?

Thanks all!

nv-braf commented 4 months ago

GPU utilization is measured in Perf Analyzer and returned to MA as one of many metrics we capture and report to the user.

The default objective to maximize is throughput and there can be a multitude of factors that cause the GPU utilization to be less than 100%.

If you are interested in maximizing GPU utilization you can specify this as the objective (see config.md for documentation on how to do this) when profiling your model.

Have you tried looking at the detailed report generated for the optimal configuration? This might point you in the right direction. It is also possible that you might need to change the maximum instance, batch size or concurrency that MA searches.

I hope this helps.

siretru commented 4 months ago

Thank you for your reply, Could you provide more details on the source of GPU utilization? Given that this metric comes, as you mention, from perf analyzer and that it is an Nvidia tool? I can't find the answer and this is probably the only place I can ask this question.

Thanks

nv-braf commented 4 months ago

@matthewkotila can you provide more details?

matthewkotila commented 4 months ago

@siretru you can find information about the GPU utilization metric that Perf Analyzer offers here:

https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/docs/measurements_metrics.md#server-side-prometheus-metrics

siretru commented 4 months ago

Hi Thanks for this : GPU utilization : Averaged from each collection taken during stable passes. We want a number representative of all stable passes.

However, this does not provide any information on how the average GPU utilization is calculated. Is it utilisation per time; per SMs occupied; ...?