microsoft / vscode-remote-release

Visual Studio Code Remote Development: Open any folder in WSL, in a Docker container, or on a remote machine using SSH and take advantage of VS Code's full feature set.
https://aka.ms/vscode-remote
Other
3.67k stars 292 forks source link

GPU detection: Not only check for runtime, but also number of GPUs #10307

Open chrmarti opened 1 month ago

chrmarti commented 1 month ago

@chrmarti

DevContainers v0.386.0 (pre-release)

Hello,

It seems that this feature is still broken (v0.386.0). If I create a remote machine (GCP) with GPU and fully installed nvidia-stack, I can build and run the devcontainer using

"hostRequirements": {

    "gpu": "optional"

},

But if I remove the GPU from my remote machine I can't start the docker container anymore as it claims having detected a GPU despite the fact that no GPU is attached:

Output of devcontainer console is:

[21551 ms] Start: Run: docker info -f {{.Runtimes.nvidia}}

[21755 ms] GPU support found, add GPU flags to docker call.

...

If I run the command you have used in your ts-scripts on the machine (no GPU anymore) I get:

{nvidia-container-runtime [] }

I think you are just checking whether the nvidia-container-runtime is available but not whether an actual gpu is attached.

const runtimeFound = result.stdout.includes('nvidia-container-runtime');

So,

`export async function extraRunArgs(common: ResolverParameters, params: DockerResolverParameters, config: DevContainerFromDockerfileConfig | DevContainerFromImageConfig) {

const extraArguments: string[] = [];

if (config.hostRequirements?.gpu) {

  if (await checkDockerSupportForGPU(params)) {

      common.output.write(`GPU support found, add GPU flags to docker call.`);

      extraArguments.push('--gpus', 'all');

  } else {

      if (config.hostRequirements?.gpu !== 'optional') {

          common.output.write('No GPU support found yet a GPU was required - consider marking it as "optional"', LogLevel.Warning);

      }

  }

}

return extraArguments;

}`

Will add --gpus 'all' if the runtime is available even if no gpu is attached. Unfortunately the container won't start if --gpus all is given but no GPU is attached to the computer. Am I missing something here?

Originally posted by @maro-otto in #9385

chrmarti commented 1 month ago

@maro-otto Could you share the output of docker info --format '{{json .}}' when you have a GPU installed? I think we might additionally have to check what the default runtime is.

maro-otto commented 1 month ago

@chrmarti docker info --format '{{json .}}' gives me (no GPU attached) {nvidia-container-runtime [] }

chrmarti commented 1 month ago

@maro-otto This looks like the output from docker info -f {{.Runtimes.nvidia}}, could you also run docker info --format '{{json .}}' with the GPU present?

maro-otto commented 1 month ago

@chrmarti Sorry for the late reply With attached GPU I get a similar result

docker info -f {{.Runtimes.nvidia}} {nvidia-container-runtime [] <nil>}

Additionally nvidia smi gives me nvidia-smi Wed Oct 9 06:39:57 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla T4 On | 00000000:00:04.0 Off | 0 | | N/A 38C P8 9W / 70W | 1MiB / 15360MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+