intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.25k stars 738 forks source link

sycl can't find AMD GPU device #15959

Closed flg closed 1 week ago

flg commented 1 week ago

Describe the bug

Compiled SYCL LLVM (b7bb74553d, Thu Oct 31 14:41:40) with the following options:

python ./buildbot/configure.py --hip --hip-platform AMD --native_cpu -t release --cmake-gen "Unix Makefiles"

The servers hosts this:

$ rocm-smi 
[...]
Device  [Model : Revision]    Temp    Power  Partitions      SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%  
        Name (20 chars)       (Edge)  (Avg)  (Mem, Compute)                                                   
==============================================================================================================
0       [0x0c34 : 0x02]       32.0°C  41.0W  N/A, N/A        800Mhz  1600Mhz  0%   auto  300.0W    0%   0%    
        Instinct MI210                                                                                        
1       [0x0c34 : 0x02]       32.0°C  40.0W  N/A, N/A        800Mhz  1600Mhz  0%   auto  300.0W    0%   0%    
        Instinct MI210

sycl-ls can't find the it:

SYCL_UR_TRACE=1 ./sycl-ls --verbose --ignore-device-selectors
<LOADER>[INFO]: failed to load adapter 'libur_adapter_level_zero.so.0' with error: libur_adapter_level_zero.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/home/ppd/flegoff/sycl_vector_add_amd/libur_adapter_level_zero.so.0' with error: /home/ppd/flegoff/sycl_vector_add_amd/libur_adapter_level_zero.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter 'libur_adapter_opencl.so.0' with error: libur_adapter_opencl.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/home/ppd/flegoff/sycl_vector_add_amd/libur_adapter_opencl.so.0' with error: /home/ppd/flegoff/sycl_vector_add_amd/libur_adapter_opencl.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter 'libur_adapter_cuda.so.0' with error: libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/home/ppd/flegoff/sycl_vector_add_amd/libur_adapter_cuda.so.0' with error: /home/ppd/flegoff/sycl_vector_add_amd/libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: loaded adapter 0x0x7bc9f0 (libur_adapter_hip.so.0)
<LOADER>[INFO]: failed to load adapter 'libur_adapter_native_cpu.so.0' with error: libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/home/ppd/flegoff/sycl_vector_add_amd/libur_adapter_native_cpu.so.0' with error: /home/ppd/flegoff/sycl_vector_add_amd/libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory

Platforms: 0
default_selector()      : No device of requested type available.
accelerator_selector()  : No device of requested type available.
cpu_selector()          : No device of requested type available.
gpu_selector()          : No device of requested type available.
custom_selector(gpu)    : No device of requested type available.
custom_selector(cpu)    : No device of requested type available.
custom_selector(acc)    : No device of requested type available.

Any idea?

To reproduce

  1. Include code snippet as short as possible
  2. Specify the command which should be used to compile the program
  3. Specify the command which should be used to launch the program
  4. Indicate what is wrong and what was expected

Environment

OS: Rocky Linux release 9.4 (Blue Onyx) rocm: 6.0.0

Additional context

No response

flg commented 1 week ago

I found what the problem was. For the record:

HIP_VISIBLE_DEVICES=0,1 sycl-ls
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60032.83]
[hip:gpu][hip:1] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60032.83]
HIP_VISIBLE_DEVICES=0 sycl-ls
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Instinct MI210 gfx90a:sramecc+:xnack- [HIP 60032.83]