intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.2k stars 709 forks source link

-fsanitize=address __DeviceType link fail #14369

Open paboyle opened 6 days ago

paboyle commented 6 days ago

Describe the bug

Hi,

When I compile my library with -fsanitize=address, In both CXXFLAGS and LDFLAGS, under icpx on Sunspot, I get:

sycl-post-link: device_global variable '__DeviceType' with property "device_image_scope" is used in more than one device image.

Is this an error in the compiler?

make[2]: Entering directory '/home/paboyle/FTHMC/Grid/systems/Aurora/HMC'

icpx -std=c++17 -I/home/paboyle/FTHMC/Grid -I/opt/aurora/24.086.0/CNDA/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/include -I/opt/aurora/24.086.0/CNDA/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/include -I/home/paboyle/spack/opt/spack/linux-sles15-skylake_avx512/gcc-7.5.0/c-lime-2-3-9-m557npl6yg2zsbomplzzf6rnaeqhnjxw/include -O3 -fiopenmp -fsycl-unnamed-lambda -fsycl -I/include -Wno-tautological-compare -I/home/paboyle/ -qmkl=parallel -fsycl -fsanitize=address -fno-strict-aliasing -L/home/paboyle/FTHMC/Grid/systems/Aurora/Grid -L/opt/aurora/24.086.0/CNDA/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/lib -L/opt/aurora/24.086.0/CNDA/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/lib -Wl,-z,now -Wl,-rpath -Wl,/opt/aurora/24.086.0/CNDA/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/lib -Wl,--enable-new-dtags -Wl,-z,now -Wl,-rpath -Wl,/opt/aurora/24.086.0/CNDA/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/lib -Wl,--enable-new-dtags -L/home/paboyle/spack/opt/spack/linux-sles15-skylake_avx512/gcc-7.5.0/c-lime-2-3-9-m557npl6yg2zsbomplzzf6rnaeqhnjxw/lib -fiopenmp -fsycl -fsycl-device-code-split=per_kernel -fsycl-device-lib=all -lze_loader -L/opt/aurora/24.086.0/CNDA/oneapi/mkl/develop_20240229/lib -qmkl=parallel -fsycl -lsycl -fsanitize=address -o ComputeWilsonFlow ComputeWilsonFlow.o ../Grid/libGrid.a -lmpicxx -lmpi -lmpicxx -lmpi -Wl,-z,now -Wl,-rpath -Wl,/opt/aurora/24.086.0/CNDA/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/lib -Wl,--enable-new-dtags -Wl,-z,now -Wl,-rpath -Wl,/opt/aurora/24.086.0/CNDA/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/lib -Wl,--enable-new-dtags -lz -lcrypto -llime -lmpfr -lgmp -lstdc++ -lm -lz

sycl-post-link: device_global variable '__DeviceType' with property "device_image_scope" is used in more than one device image.

make[2]: *** [Makefile:636: ComputeWilsonFlow] Error 1

I dug in as much as I could

Added -v -save-temps

To the link command, and found that the failing sub-command is:

/opt/aurora/24.086.0/CNDA/oneapi/compiler/eng-20240227/bin/compiler/sycl-post-link -split=kernel -emit-only-kernels-as-entry-points -emit-param-info -symbols -emit-exported-symbols -split-esimd -lower-esimd -O3 -spec-const=native -d
evice-globals -o Benchmark_dwf_fp32-sycl-spir64-unknown-unknown.table Benchmark_dwf_fp32-sycl-spir64-unknown-unknown-c29c14.bc

There are lots .o temps created in the current directory when I did this:

libsycl-sanitizer-sycl-spir64-unknown-unknown.o
libsycl-itt-user-wrappers-sycl-spir64-unknown-unknown.o
libsycl-itt-stubs-sycl-spir64-unknown-unknown.o
libsycl-itt-compiler-wrappers-sycl-spir64-unknown-unknown.o
libsycl-imf-sycl-spir64-unknown-unknown.o
libsycl-imf-fp64-sycl-spir64-unknown-unknown.o
libsycl-imf-bf16-sycl-spir64-unknown-unknown.o
libsycl-fallback-imf-sycl-spir64-unknown-unknown.o
libsycl-fallback-imf-fp64-sycl-spir64-unknown-unknown.o
libsycl-fallback-imf-bf16-sycl-spir64-unknown-unknown.o
libsycl-fallback-cstring-sycl-spir64-unknown-unknown.o
libsycl-fallback-complex-sycl-spir64-unknown-unknown.o
libsycl-fallback-complex-fp64-sycl-spir64-unknown-unknown.o
libsycl-fallback-cmath-sycl-spir64-unknown-unknown.o
libsycl-fallback-cmath-fp64-sycl-spir64-unknown-unknown.o
libsycl-fallback-cassert-sycl-spir64-unknown-unknown.o
libsycl-crt-sycl-spir64-unknown-unknown.o
libsycl-complex-sycl-spir64-unknown-unknown.o
libsycl-complex-fp64-sycl-spir64-unknown-unknown.o
libsycl-cmath-sycl-spir64-unknown-unknown.o
libsycl-cmath-fp64-sycl-spir64-unknown-unknown.o

Of these (and my own application library) the only one containing __DeviceType is

libsycl-sanitizer-sycl-spir64-unknown-unknown

paboyle@uan-0002>
for f in *.o ../Grid/libGrid.a ;
do
  echo $f ; 
  /opt/aurora/24.086.0/CNDA/oneapi/compiler/eng-20240227/bin/compiler/llvm-nm $f | grep DeviceType ;
done

libsycl-cmath-fp64-sycl-spir64-unknown-unknown.o
libsycl-cmath-sycl-spir64-unknown-unknown.o
libsycl-complex-fp64-sycl-spir64-unknown-unknown.o
libsycl-complex-sycl-spir64-unknown-unknown.o
libsycl-crt-sycl-spir64-unknown-unknown.o
libsycl-fallback-cassert-sycl-spir64-unknown-unknown.o
libsycl-fallback-cmath-fp64-sycl-spir64-unknown-unknown.o
libsycl-fallback-cmath-sycl-spir64-unknown-unknown.o
libsycl-fallback-complex-fp64-sycl-spir64-unknown-unknown.o
libsycl-fallback-complex-sycl-spir64-unknown-unknown.o
libsycl-fallback-cstring-sycl-spir64-unknown-unknown.o
libsycl-fallback-imf-bf16-sycl-spir64-unknown-unknown.o
libsycl-fallback-imf-fp64-sycl-spir64-unknown-unknown.o
libsycl-fallback-imf-sycl-spir64-unknown-unknown.o
libsycl-imf-bf16-sycl-spir64-unknown-unknown.o
libsycl-imf-fp64-sycl-spir64-unknown-unknown.o
libsycl-imf-sycl-spir64-unknown-unknown.o
libsycl-itt-compiler-wrappers-sycl-spir64-unknown-unknown.o
libsycl-itt-stubs-sycl-spir64-unknown-unknown.o
libsycl-itt-user-wrappers-sycl-spir64-unknown-unknown.o
libsycl-sanitizer-sycl-spir64-unknown-unknown.o
---------------- D __DeviceType

Feels to me that libsycl-sanitizer is associated with the fail.

I can find a. version of this under the compiler tree as, follows, with the offending variable highlighted:

nm /opt/aurora/24.086.0/CNDA/oneapi/compiler/eng-20240227/lib/libsycl-sanitizer.o
0000000000000010 B __AsanShadowMemoryGlobalEnd
0000000000000008 B __AsanShadowMemoryGlobalStart
0000000000000020 B __AsanShadowMemoryLocalEnd
0000000000000018 B __AsanShadowMemoryLocalStart
0000000000000000 W __clang_call_terminate
                 U __cxa_begin_catch
0000000000000000 t __cxx_global_var_init
0000000000000028 B __DeviceSanitizerReportMem
0000000000000280 B __DeviceType
0000000000000000 r GCC_except_table1
0000000000000020 t _GLOBAL__sub_I_sanitizer_utils_a4a308.cpp
                 U __gxx_personality_v0
0000000000000000 b _ZN4sycl3_V16detail12_GLOBAL__N_130__sycl_device_global_registrarE
0000000000000000 t _ZN4sycl3_V16detail12_GLOBAL__N_133__sycl_device_global_registrationC2Ev
                 U _ZN4sycl3_V16detail17device_global_map3addEPKvPKc
                 U _ZSt9terminatev

Any ideas how to avoid this and make the address sanitizer work?

To reproduce

  1. Include a code snippet that is as short as possible
  2. Specify the command which should be used to compile the program
  3. Specify the command which should be used to launch the program
  4. Indicate what is wrong and what was expected

Environment

Linux, Intel PVC

Additional context

No response

paboyle commented 6 days ago

On a hunch, I worried about the -fsycl-device-code-split=per_kernel flag I've been using. Removing it didn't help, but the sysl-post-link subcommand line changed from -split=kernel to -split=auto If I instead remove the -split from the sycl-post-link subcommand entirely this succeeds:

Fails:

/opt/aurora/24.086.0/CNDA/oneapi/compiler/eng-20240227/bin/compiler/sycl-post-link -split=auto -emit-only-kernels-as-entry-points -emit-param-info -symbols -emit-exported-symbols -split-esimd -lower-esimd -O3 -spec-const=native -device-globals -o Benchmark_dwf_fp32-sycl-spir64-unknown-unknown.table Benchmark_dwf_fp32-sycl-spir64-unknown-unknown-9d8e06.bc
sycl-post-link: device_global variable '__DeviceType' with property "device_image_scope" is used in more than one device image.

Succeeds

/opt/aurora/24.086.0/CNDA/oneapi/compiler/eng-20240227/bin/compiler/sycl-post-link -emit-only-kernels-as-entry-points -emit-param-info -symbols -emit-exported-symbols -split-esimd -lower-esimd -O3 -spec-const=native -device-globals -o Benchmark_dwf_fp32-sycl-spir64-unknown-unknown.table Benchmark_dwf_fp32-sycl-spir64-unknown-unknown-9d8e06.bc
paboyle commented 6 days ago

So it appears the device global strategy used in the sycl -fsanitize=address utility is incompatible with most of the kernel split strategies, including the default ?

And specifically the problem is in variables declared in:

./libdevice/sanitizer_utils.cpp:DeviceGlobal<DeviceType> __DeviceType;
./libdevice/sanitizer_utils.cpp:  if (__DeviceType == DeviceType::CPU) {
./libdevice/sanitizer_utils.cpp:  } else if (__DeviceType == DeviceType::GPU_PVC) {
./libdevice/sanitizer_utils.cpp:      __spirv_ocl_printf(__asan_print_unsupport_device_type, (int)__DeviceType);

when the kernel to device translation gets 'split' I guess it no longer appears as a global?

paboyle commented 2 days ago

I can work around this in 2 ways:

  1. -fsycl-device-code-split=off

But this forces a huge and expensive overhead on first kernel call.

  1. Host only address sanitization -Xarch_host -fsanitize=address

I've successfully run host address sanitization on the problem I was debugging. I had to switch off use of MPI as this was throwing a false positive, but run a single process clean through ASAN with no errors and all leaks understood as allocate once objects (and many MPI leaks which I can do nothing about)

I think this issue is still a problem for device code sanitization, but now not a barrier for my personal need.