tracel-ai / burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.
https://burn.dev
Apache License 2.0
8.95k stars 444 forks source link

Detection issue with heterogenous cards #2242

Closed Thrathra closed 1 month ago

Thrathra commented 2 months ago

Describe the bug Burn only lists one card with WGPU when both AMD and NVIDIA cards are installed. 7800XT and 2080Ti

To Reproduce Modify Mnist example with WGPU to use card 1 instead of card 0. Burn stops stating only one card is detected (the AMD one in my case).

Expected behavior Burn starts the example using the NVIDIA card.

Desktop

Additional context Have two cards installed in order to perform tests easily. Running the NVIDIA card alone works as expected.

laggui commented 1 month ago

Sorry for the late response, didn't notice this issue.

Just to make sure I understand this correctly, your AMD card is not being detected for WGPU?

Do you know if it is detected by vulkan? Maybe something is missing for your AMD card 🤔

Thrathra commented 1 month ago

No problem. In fact only the AMD card is detected. The NVIDIA one is not. Nvidia-smi show the proper card and I can use it using cuda dedicated code.

laggui commented 1 month ago

Burn stops stating only one card is detected (the AMD one in my case).

Whoops, I read that the other way around 😅 Thanks for clarifying.

You said you modified the example to specify which device to use. Do you have the code example? And perhaps the full error trace could also help identify the issue.

Thrathra commented 1 month ago

I run the mnist example with only a single tweak:

line 47: let device = WgpuDevice::DiscreteGPU(1); Using 0 instead of 1 effectively uses the AMD GPU.

The related error is : "No Discrete device found, adapters [...]" It only lists the AMD one.

laggui commented 1 month ago

Ahh ok and in the adapters listed in the error message the CUDA device is not listed.

When you say you can use the card with cuda dedicated code, do you mean that it works with Burn using something else?

If not, it could be that vulkan does not detect the device. You could check with vulkan tools (e.g., vkinfo or vkcube) in that case to see if it is a wgpu issue or vulkan issue.

Thrathra commented 1 month ago

Thanks for your response. I checked and it seems that for some reason the nvidia icd file was missing. Everything works properly now. Thanks a lot!

laggui commented 1 month ago

To be fair, I merely asked questions, you resolved the issue 😉

Glad to hear you managed to find the problem and fix it!