intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.26k stars 741 forks source link

Many AddressSanitizer fails on OCL CPU in Nightly #15461

Closed sarnex closed 3 days ago

sarnex commented 2 months ago

Describe the bug

https://github.com/intel/llvm/actions/runs/10952456651/job/30411378740

********************
Expectedly Failed Tests (1):
  SYCL :: AddressSanitizer/nullpointer/private_nullptr.cpp

********************
Failed Tests (37):
  SYCL :: AddressSanitizer/common/config-red-zone-size.cpp
  SYCL :: AddressSanitizer/common/demangle-kernel-name.cpp
  SYCL :: AddressSanitizer/common/kernel-debug.cpp
  SYCL :: AddressSanitizer/invalid-argument/host-pointer.cpp
  SYCL :: AddressSanitizer/memory-leak/memory-leak.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-int.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-long.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-short.cpp
  SYCL :: AddressSanitizer/multiple-reports/multiple_kernels.cpp
  SYCL :: AddressSanitizer/multiple-reports/one_kernel.cpp
  SYCL :: AddressSanitizer/nullpointer/global_nullptr.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope_unaligned.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/multi_device_images.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/large_group_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_char.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_double.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_func.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_int.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_short.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_no_local_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/unaligned_shadow_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_2d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_3d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_copy_fill.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/subbuffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_basic.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_function.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_multiargs.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/multiple_source.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/multiple_private.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/single_private.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-no-free.cpp
  SYCL :: AddressSanitizer/use-after-free/use-after-free.cpp

To reproduce

No response

Environment

No response

Additional context

No response

sarnex commented 2 months ago

@zhaomaosu @AllanZyne Can someone take a look at this? Thanks

AllanZyne commented 2 months ago

This is expected since the OCL CPU driver hasn't upgraded to the version we need. But the real issue is why this isn't detected on pre CI. This OCL requirement is introduced by https://github.com/intel/llvm/pull/14891, but its tests hadn't failed on CPU. Does nightly tests use a different OCL version?

sarnex commented 2 months ago

@AllanZyne I think it's because in precommit we run ocl cpu testing at the same time as ocl gpu and level zero on gen12, eg with -DSYCL_TEST_E2E_TARGETS="level_zero:gpu;opencl:gpu;opencl:cpu" and in the DeviceSantizier lit.local.cfg I see

# FIXME: Skip some of gpu devices, waiting for gfx driver uplifting
config.unsupported_features += ['gpu-intel-gen9', 'gpu-intel-gen11', 'gpu-intel-gen12', 'gpu-intel-pvc']

and since we were running gen12 gpu testing at the same time, I think the unsupported line above kicked in, resulting in all tests being marked unsupported in the precommit run.

Unsupported Tests (689):
...
SYCL :: AddressSanitizer/bad-free/bad-free-host.cpp
  SYCL :: AddressSanitizer/bad-free/bad-free-minus1.cpp
  SYCL :: AddressSanitizer/bad-free/bad-free-plus1.cpp
  SYCL :: AddressSanitizer/common/config-red-zone-size.cpp
  SYCL :: AddressSanitizer/common/demangle-kernel-name.cpp
  SYCL :: AddressSanitizer/common/kernel-debug.cpp
  SYCL :: AddressSanitizer/double-free/double-free.cpp
  SYCL :: AddressSanitizer/invalid-argument/bad-context.cpp
  SYCL :: AddressSanitizer/invalid-argument/host-pointer.cpp
  SYCL :: AddressSanitizer/invalid-argument/out-of-bounds.cpp
  SYCL :: AddressSanitizer/invalid-argument/released-pointer.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-int.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-long.cpp
  SYCL :: AddressSanitizer/misaligned/misalign-short.cpp
  SYCL :: AddressSanitizer/multiple-reports/multiple_kernels.cpp
  SYCL :: AddressSanitizer/multiple-reports/one_kernel.cpp
  SYCL :: AddressSanitizer/nullpointer/global_nullptr.cpp
  SYCL :: AddressSanitizer/nullpointer/private_nullptr.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/device_global_image_scope_unaligned.cpp
  SYCL :: AddressSanitizer/out-of-bounds/DeviceGlobal/multi_device_images.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/large_group_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_char.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_double.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_func.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_int.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_for_short.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/parallel_no_local_size.cpp
  SYCL :: AddressSanitizer/out-of-bounds/USM/unaligned_shadow_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_2d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_3d.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/buffer_copy_fill.cpp
  SYCL :: AddressSanitizer/out-of-bounds/buffer/subbuffer.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_basic.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_function.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/local_accessor_multiargs.cpp
  SYCL :: AddressSanitizer/out-of-bounds/local/multiple_source.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/multiple_private.cpp
  SYCL :: AddressSanitizer/out-of-bounds/private/single_private.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-free.cpp
  SYCL :: AddressSanitizer/use-after-free/quarantine-no-free.cpp
  SYCL :: AddressSanitizer/use-after-free/use-after-free.cpp
...

But in the nightly, we run OCL CPU testing individually, so the tests actually ran.

I'm not sure if it's possible to skip GPU testing but keep CPU testing if we run them all at once, fyi @aelovikov-intel

AllanZyne commented 2 months ago

Thank you for your analysis, @sarnex. @aelovikov-intel, can you help to the tell me how to prevent from skipping CPU test?

aelovikov-intel commented 2 months ago

Don't use "static" features in your conditions (as opposite to device-specific ones, like arch-intel_gpu_pvc).

AllanZyne commented 1 month ago

BTW, can we specify the version of OCL CPU in LIT tests? This will be very helpful for this case as well.

sarnex commented 1 month ago

I don't think we support that, we only have that for GPU driver. FYI @aelovikov-intel

sarnex commented 1 month ago

@AllanZyne Do you know when we plan to update the OCLCPU driver to one where these tests work? Is it public yet?

AllanZyne commented 1 month ago

New OCL CPU driver releases with oneAPI 2025.0, I hear from others that it will release at the end of this month.

sarnex commented 1 month ago

Got it, thanks

AllanZyne commented 1 month ago

Hi, if I understand correctly, I want my tests run on cpu or gpu(dg2 only), I can't configure this in lit.local.cfg because it's tedious to write all unsupported devices, but I can write

// REQUIRES: linux, cpu || (gpu && level_zero && gpu-intel-dg2)

right?

and in the future, we'll enable pvc as well:

// REQUIRES: linux, cpu || (gpu && level_zero && (gpu-intel-dg2 || gpu-intel-pvc))
AllanZyne commented 1 month ago

I don't think it's possible to write unsupported_features using "arch-*" https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc. Because we can't predict which gpu devices will be added to CI.

aelovikov-intel commented 1 month ago

If that's what you really need, you can always improve lit.cfg.py/lit.local.cfg infrastructure to make that possible...

sarnex commented 1 month ago

Replied on the PR but I think what you want should be possible

sarnex commented 3 days ago

These are all passing now with the driver bump, closing.