Support For Multiple GPUs And NVIDIA MIG(s)?

Apologies if I have missed this topic, and it is already covered in the documentation/CRDs.

When using an NVIDIA accelerator in aideployments.premlabs.io, I was wondering if it is possible to provide multiple GPUs. In a cluster with multiple GPUs:

kubectl get node lab-infra-node-3 -o yaml | grep 'nvidia.com/gpu:'
    nvidia.com/gpu: "3"

Only a single GPU was requested by the deployment:

kubectl -n premai get pods hermes-758f784656-79fbv -o yaml | grep 'nvidia.com/gpu'
        nvidia.com/gpu: "1"

In this case, I have used a deployment from https://github.com/premAI-io/prem-operator/blob/main/examples/big-agi.yaml. I would be interested in providing more GPUs to a single deployment.

Also, related to MIGs, in a cluster where GPUs are not labeled as nvidia.com/gpu:

kubectl get nodes node-with-mig -o yaml | grep 'nvidia.com/mig'
    nvidia.com/mig-7g.40gb.count: "4"
    nvidia.com/mig-7g.40gb.engines.copy: "7"
    nvidia.com/mig-7g.40gb.engines.decoder: "5"
    nvidia.com/mig-7g.40gb.engines.encoder: "0"
    nvidia.com/mig-7g.40gb.engines.jpeg: "1"
    nvidia.com/mig-7g.40gb.engines.ofa: "1"
    nvidia.com/mig-7g.40gb.memory: "40192"
    nvidia.com/mig-7g.40gb.multiprocessors: "98"
    nvidia.com/mig-7g.40gb.product: NVIDIA-A100-SXM4-40GB-MIG-7g.40gb
    nvidia.com/mig-7g.40gb.replicas: "1"
    nvidia.com/mig-7g.40gb.slices.ci: "7"
    nvidia.com/mig-7g.40gb.slices.gi: "7"
    nvidia.com/mig.capable: "true"
    nvidia.com/mig.config: all-7g.40gb
    nvidia.com/mig.config.state: failed
    nvidia.com/mig.strategy: mixed
    nvidia.com/mig-1g.5gb: "0"
    nvidia.com/mig-2g.10gb: "0"
    nvidia.com/mig-3g.20gb: "0"
    nvidia.com/mig-7g.40gb: "4"
    nvidia.com/mig-1g.5gb: "0"
    nvidia.com/mig-2g.10gb: "0"
    nvidia.com/mig-3g.20gb: "0"
    nvidia.com/mig-7g.40gb: "4"

Will aideployments.premlabs.io be able to select a MIG? I skimmed through the codebase briefly and I observed that an nvidia.com/gpu label is hard codded. https://github.com/premAI-io/prem-operator/blob/0322a6b8f9451349d7896030b75247707c1f3131/controllers/constants/labels.go#L4 (Apologies again, I am not able to test it myself at the moment on the MIG cluster).

Hello, we just have one GPU per deployment because the models we work with are relatively small so far.

It works with MIG, but you have to relabel the nodes. This can be done with the AutoNodeLabeler (https://github.com/premAI-io/prem-operator/blob/main/docs/getting_started.md#autonodelabeler) which was introduced for this purpose.

Sorry to say, but we found a number of frictions with MIG. One being that it broke DeepSpeed-MII because of the way it selected the active GPU. I patched that and you can simply not use DeepSpeed-MII, however I think it is likely similar issues will come up.

Another issue is performance; assigning multiple MIG devices to one pod is not the same as removing MIG and assigning the whole GPU. Exactly what the implications are I'm not sure, we didn't test it. Meanwhile removing MIG or changing the partitions requires a node restart at our last check.

Something to note is that if the model is smaller than the VRAM size then a number of the inference engines will still make full use of the VRAM as a cache. So it is not a complete waste to over allocate VRAM to one model if that is what is motivating MIG use.

premAI-io / prem-operator

Support For Multiple GPUs And NVIDIA MIG(s)? #39