triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Apache License 2.0
422 stars 74 forks source link

Installing model-analyzer How do I specify the dcgm version? #932

Open XIAO-FAN-5257 opened 5 days ago

XIAO-FAN-5257 commented 5 days ago

use pip3 install triton-model-analyzer, when using

model-analyzer profile --model-repository models/ --profile-models bls

it is failing with the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/monitor/dcgm/dcgm_structs.py", line 668, in _LoadDcgmLibrary
    dcgmLib = CDLL(lib_file)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib64/libdcgm.so.3: cannot open shared object file: No such file or directory

my libdcgm:

/usr/lib/x86_64-linux-gnu/libdcgm.so
/usr/lib/x86_64-linux-gnu/libdcgm.so.2
/usr/lib/x86_64-linux-gnu/libdcgm.so.2.2.9

environment

CUDA 11.8
Ubuntu 20.04
Triton Server 22.12
python 3.8.10

Has anyone seen it before ?

nv-braf commented 5 days ago

Have you tried the steps outlined here? https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html

XIAO-FAN-5257 commented 3 days ago

Thanks! I solved the problem. Copy from other machine libdcgm. So. 3 files, and then modify the model_analyzer/monitor/DCGM/dcgm_structs py files to specify .So file location.

XIAO-FAN-5257 commented 3 days ago

Do not directly upgrade the datacenter-gpu-manager version; otherwise, tritonserver will not find the dcgm

tritonserver:  error while loading shared libraries: libdcgm.so.2: cannot open shared object file:  No such file or directory