How would I ensure that a specific task gets a specific GPU?

BradyBonnette commented 4 days ago

I am trying to run Nomad (v1.9.0) with this plugin (v1.1.0) in a multi-GPU setup (H100). The plugin is installed and runs as it should.

What I would like to do is create 8 separate tasks in Nomad (could be any number, but using 8 as an example) where each task gets a specific GPU and only that GPU. E.g.

Task 1 => GPU 0 Task 2 => GPU 1 Task 3 => GPU 2 ... Task 8 => GPU 7

According to the documentation, I can supply the NVIDIA_VISIBLE_DEVICES as an env {} in the task, but doing so causes the GPU to be randomly selected instead of forcing the task to use that specific GPU. So for example, for a particular task, I would set env { NVIDIA_VISIBLE_DEVICES = <GUID OF SPECIFIC DEVICE> } and each time the task was started, it would be placed on any of the other 7 devices randomly.

I also tried setting a constraint such as:

task "mytask" {
  resources {
    device "nvidia/gpu" {
      count = 1
      constraint {
        attribute = "${device.attr.uuid}"
        value = "<GUID>"
      }
    }
  }
}

and that did not work either.

Is there something else I could try?

EDIT: Forgot to add that I traditionally accomplished this with the docker cli + nvidia container toolkit by issuing something like docker run --gpus '"device=0"' ....

BradyBonnette commented 4 days ago

Some findings of interest.

Following the instructions here: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

Assume I have the following GUIDs according to nvidia-smi -L on the host (i.e. outside of containers):

UUID: GPU-684586d6-bed0-e6a7-78e2-bf784635fd1b
UUID: GPU-49c859a6-c61a-0d40-adc6-8b463b489b00
UUID: GPU-13df2a8b-06ec-1168-9d95-d8a058bca48b
UUID: GPU-97518ddc-a1f1-ebaf-e0e9-18653b33fccc
UUID: GPU-00c18080-96ba-7cca-7a2a-c20ef3db6911
UUID: GPU-db76219f-0eaf-bd49-439e-7d82ae8aa526
UUID: GPU-ca3d6363-ee91-b096-d9b3-3e150c5c134c
UUID: GPU-675af948-8174-344c-ab1e-15075292349d

If I run the container manually like this:

sudo docker run -it --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=GPU-684586d6-bed0-e6a7-78e2-bf784635fd1b myimage /bin/bash

and then run nvidia-smi -L inside the container, I see:

app@0b14ff4e14ef:~$ nvidia-smi -L
GPU 0: (UUID: GPU-684586d6-bed0-e6a7-78e2-bf784635fd1b)

Manually running the same exact container again, but with a different GUID using:

sudo docker run -it --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=GPU-ca3d6363-ee91-b096-d9b3-3e150c5c134c myimage /bin/bash

and then running nvidia-smi -L inside that running container yields:

GPU 0: (UUID: GPU-ca3d6363-ee91-b096-d9b3-3e150c5c134c)

Same works for any GUID I try. Seems that everything works as expected when ran manually.

Now on the Nomad side.

Assume I have a task in my job set up in the following manner (other irrelevant stuff left out):

    task "llm-runner-1" {
      driver = "docker"
      config {
        image = "myimage"
        ports = ["api-1"]
        runtime = "nvidia"
      }
      service {
        name = "llm-runner-1"
        port = "api-1"
        provider = "nomad"
      }
      resources {
        cpu    = 3000
        memory = 3000
        device "nvidia/gpu" {
          count = 1
        }
      }

      env {
        NVIDIA_VISIBLE_DEVICES = "GPU-684586d6-bed0-e6a7-78e2-bf784635fd1b"
      }
    }

Then running an Exec window from nomad for that particular runner:

What docker inspect shows for that same exact running container:

[... omitted ...]
"NOMAD_TASK_DIR=/local",
"NOMAD_TASK_NAME=llm-runner-1",
"NVIDIA_VISIBLE_DEVICES=GPU-db76219f-0eaf-bd49-439e-7d82ae8aa526",
[... omitted ...]

So I am not exactly sure where the wires are getting crossed. It does the same thing with or without specifying the runtime = "nvidia"

BradyBonnette commented 3 days ago

🤦

I apologize. This might have been less of a bug and more of a documentation issue.

I discovered this: https://developer.hashicorp.com/nomad/docs/job-specification/device#affinity-towards-specific-gpu-devices and that was exactly what I was looking for, except I wanted a constraint and not an affinity. Note that constraint works this way as well.

I guess I was looking only at the nomad-device-nvidia documentation only and never thought to check other sources.

Feel free to close this if you believe it's not a bug of any kind, or keep it open if you want for further work (i.e. if you feel the documentation and/or internals surrounding NVIDIA_VISIBLE_DEVICES is misleading)

hashicorp / nomad-device-nvidia

How would I ensure that a specific task gets a specific GPU? #56