intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.26k stars 744 forks source link

[SYCL] throw-exception-for-out-of-registers-on-kernel-launch.cpp failing in WW06'24 pulldown #12679

Open jsji opened 9 months ago

jsji commented 9 months ago

Describe the bug throw-exception-for-out-of-registers-on-kernel-launch.cpp is failing with WW06 pulldown in https://github.com/intel/llvm/pull/12661.

https://github.com/intel/llvm/actions/runs/7840195347/job/21394406888

-- Testing: [18](https://github.com/intel/llvm/actions/runs/7840195347/job/21394406888#step:21:19)84 tests, 8 workers --
FAIL: SYCL :: OptionalKernelFeatures/throw-exception-for-out-of-registers-on-kernel-launch.cpp (1437 of 1884)
******************** TEST 'SYCL :: OptionalKernelFeatures/throw-exception-for-out-of-registers-on-kernel-launch.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/OptionalKernelFeatures/throw-exception-for-out-of-registers-on-kernel-launch.cpp -o /__w/llvm/llvm/build-e2e/OptionalKernelFeatures/Output/throw-exception-for-out-of-registers-on-kernel-launch.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/OptionalKernelFeatures/throw-exception-for-out-of-registers-on-kernel-launch.cpp -o /__w/llvm/llvm/build-e2e/OptionalKernelFeatures/Output/throw-exception-for-out-of-registers-on-kernel-launch.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 3
env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu  /__w/llvm/llvm/build-e2e/OptionalKernelFeatures/Output/throw-exception-for-out-of-registers-on-kernel-launch.cpp.tmp.out
# executed command: env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu /__w/llvm/llvm/build-e2e/OptionalKernelFeatures/Output/throw-exception-for-out-of-registers-on-kernel-launch.cpp.tmp.out
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1
jsji commented 9 months ago

@GeorgeWeb Would you please help to have a look? Thanks.

GeorgeWeb commented 9 months ago

@GeorgeWeb Would you please help to have a look? Thanks.

@jsji Yeah, no problem. If it is blocking the pulldown, the XFAIL commit is fine, because the test cannot guarantee that more than 64 registers are always going to be used in the kernel. I'll take a look at it on Monday. Thanks for pinging me!

jsji commented 9 months ago

@GeorgeWeb Would you please help to have a look? Thanks.

@jsji Yeah, no problem. If it is blocking the pulldown, the XFAIL commit is fine, because the test cannot guarantee that more than 64 registers are always going to be used in the kernel. I'll take a look at it on Monday. Thanks for pinging me!

Thanks. Yes, I have XFAILed it in pulldown. Thank you @GeorgeWeb .

aelovikov-intel commented 6 months ago

@GeorgeWeb Would you please help to have a look? Thanks.

@jsji Yeah, no problem. If it is blocking the pulldown, the XFAIL commit is fine, because the test cannot guarantee that more than 64 registers are always going to be used in the kernel. I'll take a look at it on Monday. Thanks for pinging me!

@GeorgeWeb any updates on this?

GeorgeWeb commented 6 months ago

@GeorgeWeb Would you please help to have a look? Thanks.

@jsji Yeah, no problem. If it is blocking the pulldown, the XFAIL commit is fine, because the test cannot guarantee that more than 64 registers are always going to be used in the kernel. I'll take a look at it on Monday. Thanks for pinging me!

@GeorgeWeb any updates on this?

Hey. I have no conclsuive updates on this. I did look into it after it was flagged and to be honest it is not easy to write a stable test to guarantee the register spill across compiler versions. I will have a chat internally again and see whether there is any better suggestion than my judgement or we may have to remove that test. The failure doesn't mean anything has broken, rather the test itself is a bit unstable.