triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Apache License 2.0
406 stars 75 forks source link

Model Analyzer GPU Memory Usage Differences #847

Open KimiJL opened 4 months ago

KimiJL commented 4 months ago

Version: nvcr.io/nvidia/tritonserver:24.01-py3-sdk

For a profiled model, the GPU Memory Usage (MB) shown in results/metrics-model-gpu.csv is different from model result_summary.pdf.

In my case, metrics-model-gpu.csv shows 1592.8 while the pdf report shows 1031.

Could be my misunderstanding, do these two metrics represent the same thing? I am looking for the maximum GPU usage for a given model, so which would be the more accurate result?

KimiJL commented 4 months ago

Additional Context:

I am using an instance with two GPUs, though the model is limited to a single instance.

I have noticed that if I added up the GPU memory of both GPUs from csv, then divide by 2, I (470.8 + 1592.8) / 2 = 1031.8, i'm getting near the pdf result. Could be a coincidence?

tgerdesnv commented 3 months ago

Hi @KimiJL, sorry for the slow response. I just returned from vacation.

I suspect that your observation is not a coincidence and that there is a bug. We will have to investigate further.

May I ask, were you running in local mode? Or docker or remote?

KimiJL commented 3 months ago

Hi @tgerdesnv thanks for the response,

I was running in in --triton-launch-mode=docker

tgerdesnv commented 3 months ago

@KimiJL I have confirmed that the values in the pdfs are in fact the averages across the GPUs. The values in metrics-model-gpu.csv are the raw values per-gpu. So, in your case, the total maximum memory usage by the model on your machine would be 470.8 + 1592.8

I will fix Model Analyzer to show total memory usage, or clarify the labels to indicate that it is average memory usage.

KimiJL commented 3 months ago

@tgerdesnv great, thank you for the clarification, that makes sense!