Closed AuroraPerego closed 1 month ago
FYI @fwyzard @ivorobts
Thanks for the issue. Nice catch!
@GeorgeWeb I've checked the nightly-2024-10-14
and the issue seems fixed. Thanks!
$ clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_60 -dM -E test.cpp | grep "NVIDIA"
#define __SYCL_TARGET_NVIDIA_GPU_SM_50__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_52__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_53__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_60__ 1
#define __SYCL_TARGET_NVIDIA_GPU_SM_61__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_62__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_70__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_72__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_75__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_80__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_86__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_87__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_89__ 0
#define __SYCL_TARGET_NVIDIA_GPU_SM_90__ 0
Closing as https://github.com/intel/llvm/pull/15615 fixed it.
Describe the bug
When compiling AOT for a specific target the corresponding macro is set to 1, while the macros for all the other targets are set to 0. However, for the CUDA backend, the macro that are set to 0 by the compiler end with
*_SM**__
, while those that correspond to the target we are compiling for end with*_SM_**__
. As an example, when compiling for NVIDIA Pascal architecture the macro defined are:To reproduce
test.cpp
while only one of the two should exist.
Environment
Additional context
The problem may be related to the definitions in the file
/opt/intel/oneapi/compiler/latest/include/sycl/ext/oneapi/experimental/device_architecture.hpp
.