intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.23k stars 736 forks source link

Test SYCL :: Printf/int.cpp failing on CUDA #14734

Open lbushi25 opened 3 months ago

lbushi25 commented 3 months ago

Describe the bug

The test SYCL :: Printf/int.cpp is failing on CUDA. Upon resolution, please remove XFAIL from test source.

To reproduce

/__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=nvptx64-nvidia-cuda  /__w/llvm/llvm/llvm/sycl/test-e2e/Printf/int.cpp -o /__w/llvm/llvm/build-e2e/Printf/Output/int.cpp.tmp.out
env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu  /__w/llvm/llvm/build-e2e/Printf/Output/int.cpp.tmp.out | /__w/llvm/llvm/toolchain/bin/FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Printf/int.cpp

Environment

Platforms: 2 Platform [#1]: Version : CUDA 12.1 Name : NVIDIA CUDA BACKEND Vendor : NVIDIA Corporation Devices : 1 Device [#0]: Type : gpu Version : 8.6 Name : NVIDIA A10G Vendor : NVIDIA Corporation Driver : CUDA 12.1 UUID : 83176201134617011210741816[9]219207142181102 Num SubDevices : 0 Num SubSubDevices : 0 Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime. Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag info::device::sub_group_sizes: 32 Architecture: nvidia_gpu_sm_86 Platform [#2]: Version : 0.1 Name : SYCL_NATIVE_CPU Vendor : tbd Devices : 1 Device [#0]: Type : cpu Version : 0.1 Name : SYCL Native CPU Vendor : Intel(R) Corporation Driver : 0.0.0 Num SubDevices : 0 Num SubSubDevices : 0 Aspects : cpu fp16 fp64 queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 info::device::sub_group_sizes: 1 Architecture: unknown default_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA A10G 8.6 [CUDA 12.1] accelerator_selector() : No device of requested type available. -1 (PI_ERRO... cpu_selector() : cpu, SYCL_NATIVE_CPU, SYCL Native CPU 0.1 [0.0.0] gpu_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA A10G 8.6 [CUDA 12.1] custom_selector(gpu) : gpu, NVIDIA CUDA BACKEND, NVIDIA A10G 8.6 [CUDA 12.1] custom_selector(cpu) : cpu, SYCL_NATIVE_CPU, SYCL Native CPU 0.1 [0.0.0] custom_selector(acc) : No device of requested type available. -1 (PI_ERRO...



### Additional context

_No response_
JackAKirk commented 1 month ago

cc @jchlanda

jchlanda commented 1 month ago

I've just run this on an A-100 and it passes fine:

Decimal positive values:
        signed char: 123
        short: -27928
        int: 12345
        long: 1234567891
        long long: 1234567891
        intmax_t: %jd
        signed size_t: %zd
        ptrdiff_t: %td
Integer positive values:
        signed char: 123
        short: -27928
        int: 12345
        long: 1234567891
        long long: 1234567891
        intmax_t: %ji
        signed size_t: %zi
        ptrdiff_t: %ti
Decimal negative values:
        signed char: -123
        short: -27928
        int: -12345
        long: 3060399405
        long long: -1234567891
        intmax_t: %jd
        signed size_t: %zd
        ptrdiff_t: %td
Integer negative values:
        signed char: -123
        short: -27928
        int: -12345
        long: 3060399405
        long long: -1234567891
        intmax_t: %ji
        signed size_t: %zi
        ptrdiff_t: %ti
Octal:
        unsigned char: 123
        unsigned short: 111350
        unsigned int: 123456
        unsigned long: 12345670123
        unsigned long long: 12345670123
        uintmax_t: %jo
        size_t: %zo
        ptrdiff_t (unsigned version): %to
Hexadecimal:
        unsigned char: 12
        unsigned short: 92e8
        unsigned int: 1234
        unsigned long: 12345678
        unsigned long long: 12345678
        uintmax_t: %jx
        size_t: %zx
        ptrdiff_t: %tx
Hexadecimal (capital letters):
        unsigned char: 12
        unsigned short: 92E8
        unsigned int: 1234
        unsigned long: 12345678
        unsigned long long: 12345678
        uintmax_t: %jX
        size_t: %zX
        ptrdiff_t: %tX
Unsigned decimal:
        unsigned char: 123
        unsigned short: 37608
        unsigned int: 12345
        unsigned long: 1234567891
        unsigned long long: 1234567891
        uintmax_t: %ju
        size_t: %zu
        ptrdiff_t: %tu

Note, that the test mentions:

FIXME: The 'short' type gets overflown with sporadic values on CUDA.

so there might be some non-deterministic issues going on.

Additional, I remember looking into it and submitting a bug with hhd specifier (https://forums.developer.nvidia.com/t/incorrect-results-for-printf-with-hhd-format-specifier/218643) but that seems to be resolved now.