Gpumanager is unable to control GPU threshold and GPU memory.

yangcheng-dev commented 8 months ago

It seems that after deploying the gpumanager project, there is an issue with controlling GPU memory. I have set the pod quotas as follows, but the control over GPU memory and computing power does not take effect. What's even more peculiar is that if I wait for the same pod for over 60 minutes, the restrictions are likely to take effect. There are no obvious errors in the gpumanager logs. resources: limits: tencent.com/vcuda-core: "50" tencent.com/vcuda-memory: "32" requests: tencent.com/vcuda-core: "50" tencent.com/vcuda-memory: "32"

ferris-cx commented 3 months ago

Has this problem already been solved?

Mirai233 commented 2 months ago

It seems that after deploying the gpumanager project, there is an issue with controlling GPU memory. I have set the pod quotas as follows, but the control over GPU memory and computing power does not take effect. What's even more peculiar is that if I wait for the same pod for over 60 minutes, the restrictions are likely to take effect. There are no obvious errors in the gpumanager logs. resources: limits: tencent.com/vcuda-core: "50" tencent.com/vcuda-memory: "32" requests: tencent.com/vcuda-core: "50" tencent.com/vcuda-memory: "32"

Just remove the cuGetProcessAddress implement,it will cause this problem.

tkestack / gpu-manager

Gpumanager is unable to control GPU threshold and GPU memory. #192