triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.39k stars 1.49k forks source link

Running separate DCGM on Kubernetes cluster #7597

Closed ysk24ok closed 1 month ago

ysk24ok commented 2 months ago

In the release note of 24.08, there is a known issue which is

Triton metrics might not work if the host machine is running a separate DCGM agent on bare-metal or in a container.

Also I found https://github.com/triton-inference-server/server/issues/3897#issuecomment-1035414009 which had said,

Triton uses the library version of DCGM which is not allowed to co-exist with the container version of DCGM

I'm using Google Kubernetes Engine and followed this page to try running nvidia-dcgm and nvidia-dcgm-exporter pods (DaemonSet). It turned out Triton metrics worked even with the separate DCGM in another pod.

Triton metrics nv_gpu_utilization{gpu_uuid="GPU-c1eb1e78-d69a-c334-9ce5-2cec7c8399a1"} 0

DCGM exporter metrics DCGM_FI_DEV_GPU_UTIL{gpu="0",UUID="GPU-c1eb1e78-d69a-c334-9ce5-2cec7c8399a1",device="nvidia0",modelName="Tesla T4",Hostname="gke-test-dcgm-default-pool-270c4b72-ct4k",container="triton-server",namespace="default",pod="triton-86c854b54b-84l8q"} 0

Based on this observation, I have several questions.

Tested versions

ysk24ok commented 1 month ago

The same is being discussed in https://github.com/NVIDIA/DCGM/issues/191.