microsoft / vscode-ai-toolkit

MIT License
862 stars 36 forks source link

Nvidia Gpu not detected , driver not found? #42

Open sistemasITI opened 5 months ago

sistemasITI commented 5 months ago

Hi, I've tested in 2 different environments and none of them get that detects the GPU. The first environment is with an Nvidia A100 and the second with an Nvidia rtx 3060 (both on windows 11) I have the latest nvidia drivers installed. What am I missing or what am I doing wrong?

this is the error code:

[2024-01-15T14:51:35.519Z] [INFO] Extenension: Invoking validateEnvironement for: nvidia-driver Debug: validate-env[0] 03:51:35.60 0 ExecuteAsync Started Information: validate-env[0] 03:51:35.67 0 IsNvidiaDiverAvailable Execution Information: validate-env[0] 03:51:35.73 0 IsNvidiaDiverAvailable : False Debug: validate-env[0] 03:51:35.73 0 ExecuteAsync Completed Elapsed:00:00:00.1431411

ningx-ms commented 5 months ago

Hi @sistemasITI , did you click the button "Setup WSL Environment"? It will install CUDA and Conda for WSL then re-detect the environment.

elsaco commented 5 months ago

@sistemasITI at what stage the GPU is not being detected. Here's a screenshot after invoking the plugin:

win-ai-validate-env

Are you not getting the NVIDIA GPU detected in the prerequisites output?

Since your setup is failing with IsNvidiaDiverAvailable : False are you using a driver with wsl support? What's your nvidia driver version?

sistemasITI commented 5 months ago

Hi, here a screenshot of the status: Captura desde 2024-01-17 08-40-53

This is the screenshot of the server with a Nvidia A100 (but in the pc with nvidia 3060 is the same result) As you can see, the "Setup WSL environment" is disabled. conda is also not detected (but it is installed)

The drivers are the latest available on nvidia website.

ningx-ms commented 5 months ago

Could you share the most recent *.cli.log in %USERPROFILE%.wais on Windows, for example 20240104-260872-cli.log?

These logs will provide additional information on the check.

sistemasITI commented 5 months ago

Could you share the most recent *.cli.log in %USERPROFILE%.wais on Windows, for example 20240104-260872-cli.log?

These logs will provide additional information on the check.

Here is:

`Debug: validate-env[0] 03:45:40.98 0 ExecuteAsync Started Information: validate-env[0] 03:45:41.46 0 IsWSLDetected Execution Error: validate-env[0] 03:45:54.27 0 Error: No LSB modules are available.

Information: validate-env[0] 03:45:54.27 0 The default WSL distribution is Ubuntu 18.04 or greater. Information: validate-env[0] 03:45:54.27 0 IsNvidiaDiverAvailable Execution Information: validate-env[0] 03:45:54.78 0 IsNvidiaDiverAvailable : False Debug: validate-env[0] 03:45:54.78 0 ExecuteAsync Completed Elapsed:00:00:13.7974470 Debug: validate-env[0] 03:45:55.47 0 ExecuteAsync Started Information: validate-env[0] 03:45:55.72 0 IsCondaInstalled Execution Information: validate-env[0] 03:45:56.14 0 IsCondaInstalled : False Debug: validate-env[0] 03:45:56.14 0 ExecuteAsync Completed Elapsed:00:00:00.6724627 Debug: validate-env[0] 03:45:56.25 0 ExecuteAsync Started Information: validate-env[0] 03:45:56.49 0 IsCudaRuntimeInstalled Execution Information: validate-env[0] 03:45:56.86 0 IsCudaRuntimeInstalled : True Debug: validate-env[0] 03:45:56.86 0 ExecuteAsync Completed Elapsed:00:00:00.6119740 Debug: validate-env[0] 03:45:56.96 0 ExecuteAsync Started Information: validate-env[0] 03:45:57.19 0 IsNvidiaDiverAvailable Execution Information: validate-env[0] 03:45:57.57 0 IsNvidiaDiverAvailable : False Debug: validate-env[0] 03:45:57.57 0 ExecuteAsync Completed Elapsed:00:00:00.6160153 Debug: validate-env[0] 03:45:57.68 0 ExecuteAsync Started Information: validate-env[0] 03:45:57.91 0 IsWSLDetected Execution Debug: validate-env[0] 03:45:58.08 0 ExecuteAsync Completed Elapsed:00:00:00.3982375 Debug: validate-env[0] 03:45:58.17 0 ExecuteAsync Started Error: validate-env[0] 03:45:59.00 0 Error: No LSB modules are available.

Debug: validate-env[0] 03:45:59.00 0 ExecuteAsync Completed Elapsed:00:00:00.8291484 Debug: validate-env[0] 03:45:59.09 0 ExecuteAsync Started Error: validate-env[0] 03:45:59.91 0 Error: No LSB modules are available.

Information: validate-env[0] 03:45:59.91 0 The default WSL distribution is Ubuntu 18.04 or greater. Debug: validate-env[0] 03:45:59.91 0 ExecuteAsync Completed Elapsed:00:00:00.8202500`

sistemasITI commented 5 months ago

Hello, some suggestions?

ningx-ms commented 5 months ago

Hi @sistemasITI , could you try install Nvidia driver for Windows from the official drop below?

https://www.nvidia.com/Download/index.aspx?lang=en-us

sistemasITI commented 5 months ago

Hi @sistemasITI , could you try install Nvidia driver for Windows from the official drop below?

https://www.nvidia.com/Download/index.aspx?lang=en-us

Hello, I already had them installed, but I have reinstalled on the 3 computers and the card is still not detected on the 3 computers. What is happening? Why is the card not detected in any computer?

ningx-ms commented 5 months ago

The extension runs nvidia-smi.exe to detect NV GPU. Could you run nvidia-smi in a windows console and see if it outputs something like below?

image

It could be that nvidia-smi.exe may not be in the system environment path.

elsaco commented 5 months ago

@ningx-ms how is the Nvidia GPU detected/selected when the main display is iGPU?

On a notebook with iGPU as main display the Nvidia GPU is not being seen by the plugin:

wsl_no_nvidia_gpu

The nvidia-smi reports one GPU and no excluded devices:

PS C:\Users\elsaco> nvidia-smi -L
GPU 0: Quadro P1000 (UUID: GPU-1bfef509-e89e-9fef-e986-8979dab8e22a)
PS C:\Users\elsaco> nvidia-smi -B
No excluded devices found.
sistemasITI commented 5 months ago

The extension runs nvidia-smi.exe to detect NV GPU. Could you run nvidia-smi in a windows console and see if it outputs something like below?

image

It could be that nvidia-smi.exe may not be in the system environment path.

nvidia-smi works fine, I also executed the two commands @elsaco say:

Captura desde 2024-01-30 18-56-11

I don't know what is happening :(

leestott commented 3 weeks ago

Hi So I have seen this issue before

Check cuDNN Installation:

First run updates on all packages

sudo apt update
sudo apt upgrade 

Ensure that you have installed cuDNN correctly. You can download the cuDNN library from the NVIDIA website and follow the installation guide.

Install cuda drivers and onnxruntime

pip install onnxruntime 
pip install onnxruntime-gpu 

Make sure the library is in the expected location (usually /usr/local/cuda/lib64).

Check LD_LIBRARY_PATH:Set the LD_LIBRARY_PATH environment variable to include the directory containing libcudnn.so.8. For example:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Verify CUDA Toolkit Version: Confirm that your installed CUDA Toolkit version matches the version expected by TensorFlow. You might need to adjust the CUDA version in your TensorFlow code or install a compatible version of cuDNN.

If you then get a error saying a specific version is missing i.e. libcudnn8 I recommend you manually install

Find if the library is installed

find / -type f -name "libcudnn.so.8" 2>/dev/null

Failed loading model mistral-7b-v02-int4-gpu: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

You can then reinstall the specific version

sudo apt-get install libcudnn8
manwithaplandy commented 1 week ago

I'm having this same issue and I'm a bit confused by this last instruction. You are listing a bunch of Linux commands but the issue is on a Windows machine. Do all of these NVIDIA packages need to be installed on both Windows and WSL? When I run nvidia-smi on cmd it works fine, but when I run it in WSL I get an error that it cannot communicate with the NVIDIA driver. When it is checking for a valid GPU, does it check by running nvidia-smi in cmd/ps or wsl?