GPU detection: Not only check for runtime, but also number of GPUs

chrmarti commented 1 month ago

@chrmarti

DevContainers v0.386.0 (pre-release)

Hello,

It seems that this feature is still broken (v0.386.0). If I create a remote machine (GCP) with GPU and fully installed nvidia-stack, I can build and run the devcontainer using
"hostRequirements": {

    "gpu": "optional"

},
But if I remove the GPU from my remote machine I can't start the docker container anymore as it claims having detected a GPU despite the fact that no GPU is attached:

Output of devcontainer console is:

[21551 ms] Start: Run: docker info -f {{.Runtimes.nvidia}}

[21755 ms] GPU support found, add GPU flags to docker call.

...

If I run the command you have used in your ts-scripts on the machine (no GPU anymore) I get:

{nvidia-container-runtime [] }

I think you are just checking whether the nvidia-container-runtime is available but not whether an actual gpu is attached.

const runtimeFound = result.stdout.includes('nvidia-container-runtime');

So,

`export async function extraRunArgs(common: ResolverParameters, params: DockerResolverParameters, config: DevContainerFromDockerfileConfig | DevContainerFromImageConfig) {

const extraArguments: string[] = [];

if (config.hostRequirements?.gpu) {
  if (await checkDockerSupportForGPU(params)) {

      common.output.write(`GPU support found, add GPU flags to docker call.`);

      extraArguments.push('--gpus', 'all');

  } else {

      if (config.hostRequirements?.gpu !== 'optional') {

          common.output.write('No GPU support found yet a GPU was required - consider marking it as "optional"', LogLevel.Warning);

      }

  }
}

return extraArguments;

}`

Will add --gpus 'all' if the runtime is available even if no gpu is attached. Unfortunately the container won't start if --gpus all is given but no GPU is attached to the computer. Am I missing something here?

Originally posted by @maro-otto in #9385

chrmarti commented 1 month ago

@maro-otto Could you share the output of docker info --format '{{json .}}' when you have a GPU installed? I think we might additionally have to check what the default runtime is.

maro-otto commented 1 month ago

@chrmarti docker info --format '{{json .}}' gives me (no GPU attached) {nvidia-container-runtime [] }

chrmarti commented 1 month ago

@maro-otto This looks like the output from docker info -f {{.Runtimes.nvidia}}, could you also run docker info --format '{{json .}}' with the GPU present?

maro-otto commented 1 month ago

@chrmarti Sorry for the late reply With attached GPU I get a similar result

docker info -f {{.Runtimes.nvidia}} {nvidia-container-runtime [] <nil>}

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+

microsoft / vscode-remote-release

GPU detection: Not only check for runtime, but also number of GPUs #10307