Open ZiMengSheng opened 3 months ago
This issue has been automatically marked as stale because it has not had recent activity. This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, the issue is closed
You can:/remove-lifecycle stale
/close
Thank you for your contributions.
What happened:
DCGM 采用 PodResources 接口暴露 Pod 的 GPU 指标,这依赖 kubelet 的 GPU 分配结果,但是 Koordinator 的 GPU 分配结果是调度器分配的,因此 DCGM 这里会有问题。
What you expected to happen:
用户能够通过某种方式看到和 dcgm 一样的指标
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):