Segfault building pytorch sycl kernels with intel/llvm: binary Instruction seen with illegal int type

dvrogozh commented 2 months ago

I am trying to use intel/llvm instead of dpc++ compiler to build pytorch XPU backend which has sycl kernels. There are few issues met with this effort which I have hacks/workarounds for. However, I do see segmentation faults in ocloc compiler on device linkage stage for xe-lpg building the following 2 kernels:

The error being printed is:

Binary Instruction seen with illegal int type. Legalization support missing. Inst opcode:25[0]: /lib/x86_64-linux-gnu/libocloc.so(+0xc1a64) [0x7fcca7221a64]

Note that dpc++ compiler version 2024.1 (officially used for pytorch build following https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html) can build these kernels successfully. So, dpc++ 2024.1 and intel/llvm work against same ocloc version and gpu stack on my system with the latter having segfault. Based on that I think debug should be started on intel/llvm level. Note also that I worry that dpc++ 2025 will have the same issue - this compiler is not currently verified to work for pytorch xpu backend.

Call stack with the issue:

icpx -fPIC -fsycl -fpreview-breaking-changes -fsycl-targets=spir64_gen,spir64 -fno-sycl-unnamed-lambda -sycl-std=2020 -fhonor-nans -fhonor-infinities -fno-associative-math -fno-approx-func -Wno-absolute-value -D__INTEL_PREVIEW_BREAKING_CHANGES -D_GLIBCXX_USE_CXX11_ABI=1 -fsycl-fp64-conv-emu -fsycl-max-parallel-link-jobs=208 -fsycl-targets=spir64_gen,spir64 -fsycl-link out.o -Xs -device\ xe-lpg\ -options\ '\ -cl-poison-unsupported-fp64-kernels\ -cl-intel-enable-auto-large-GRF-mode\ -cl-fp32-correctly-rounded-divide-sqrt' -o a.o
llvm-foreach: adjusted number of threads to 160 (max safe available).
llvm-foreach: adjusted number of threads to 160 (max safe available).
Compilation from IR - skipping loading of FCL
Compilation from IR - skipping loading of FCL

warning: kernel _ZTSN2at6native3xpu12ReduceKernelILi1ENS1_8ReduceOpIN3c104HalfENS1_9ArgMaxOpsIfEEjlLi4EEEEE  compiled SIMD8 allocated 128 regs and spilled around 8

warning: kernel _ZTSN2at6native3xpu12ReduceKernelILi4ENS1_8ReduceOpIN3c104HalfENS1_9ArgMaxOpsIfEEjlLi4EEEEE  compiled SIMD8 allocated 128 regs and spilled around 54

Build succeeded for : arl-s.
Compilation from IR - skipping loading of FCL

warning: kernel _ZTSN2at6native3xpu12ReduceKernelILi1ENS1_8ReduceOpIN3c104HalfENS1_9ArgMaxOpsIfEEjlLi4EEEEE  compiled SIMD8 allocated 128 regs and spilled around 8

warning: kernel _ZTSN2at6native3xpu12ReduceKernelILi4ENS1_8ReduceOpIN3c104HalfENS1_9ArgMaxOpsIfEEjlLi4EEEEE  compiled SIMD8 allocated 128 regs and spilled around 54

Build succeeded for : mtl-h.
Binary Instruction seen with illegal int type. Legalization support missing. Inst opcode:25[0]: /lib/x86_64-linux-gnu/libocloc.so(+0xc1a64) [0x7fcca7221a64]
[1]: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fcca6f79520]
[2]: /lib/x86_64-linux-gnu/libigc.so.1(+0x96912f) [0x7fcca21a712f]
[3]: /lib/x86_64-linux-gnu/libigc.so.1(+0xd014c9) [0x7fcca253f4c9]
[4]: /lib/x86_64-linux-gnu/libigc.so.1(+0xd08bad) [0x7fcca2546bad]
[5]: /lib/x86_64-linux-gnu/libigc.so.1(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x2be) [0x7fcca2fe51ae]
[6]: /lib/x86_64-linux-gnu/libigc.so.1(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x34) [0x7fcca2fe54d4]
[7]: /lib/x86_64-linux-gnu/libigc.so.1(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x32c) [0x7fcca2fe626c]
[8]: /lib/x86_64-linux-gnu/libigc.so.1(+0xc861b2) [0x7fcca24c41b2]
[9]: /lib/x86_64-linux-gnu/libigc.so.1(+0x90a55e) [0x7fcca214855e]
[10]: /lib/x86_64-linux-gnu/libigc.so.1(+0xb6b61b) [0x7fcca23a961b]
[11]: /lib/x86_64-linux-gnu/libigc.so.1(+0x90cf27) [0x7fcca214af27]
[12]: /lib/x86_64-linux-gnu/libigc.so.1(+0x984ccd) [0x7fcca21c2ccd]
[13]: /lib/x86_64-linux-gnu/libigc.so.1(+0x9861de) [0x7fcca21c41de]
[14]: /lib/x86_64-linux-gnu/libocloc.so(+0x9a386) [0x7fcca71fa386]
[15]: /lib/x86_64-linux-gnu/libocloc.so(+0xc3acf) [0x7fcca7223acf]
[16]: /lib/x86_64-linux-gnu/libocloc.so(+0xc1cc8) [0x7fcca7221cc8]
[17]: /lib/x86_64-linux-gnu/libocloc.so(+0x89ca9) [0x7fcca71e9ca9]
[18]: /lib/x86_64-linux-gnu/libocloc.so(oclocInvoke+0x8ee) [0x7fcca71eb81e]
[19]: /usr/bin/ocloc(+0x637) [0x5610826ac637]
[20]: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fcca6f60d90]
[21]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fcca6f60e40]
[22]: /usr/bin/ocloc(+0x665) [0x5610826ac665]
llvm-foreach: Segmentation fault (core dumped)
icpx: error: gen compiler command failed with exit code 254 (use -v to see invocation)
clang version 19.0.0git (https://github.com/intel/llvm.git e16b0a434b14089140e4bd76f27adc18c9d782ae)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/dvrogozh/git/llvm/build/install/bin
Build config: +assertions
icpx: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

Easier reproducer

Build intel/llvm
Get pre-compiled faulty kernel: https://github.com/dvrogozh/pytorch/blob/intel-llvm/ReduceArgMaxKernel_preproc.ii

Try to link, this should reproduce the failure (I reduced link options to minimal):

icpx -fsycl -sycl-std=2020 -fsycl-targets=spir64_gen,spir64 -fsycl-link ReduceArgMaxKernel_preproc.ii -Xs "-device xe-lpg" -o a.o

Full reproducer

Additional effort will be needed to simplify reproducer. Below are current reproduce steps which assumes building pytorch xpu backend.

Install dGPU drivers from https://dgpu-docs.intel.com/driver/installation.html#installation
Build intel/llvm and add symbolic links: {icx, icpx} -> clang-19
Create virtual environment: python3 -m venv ~/pytorch.xpu

Configure environment:

source ~/pytorch.xpu/bin/activate
export _GLIBCXX_USE_CXX11_ABI=1
export USE_XPU=1
export PATH=/home/dvrogozh/git/install/bin:$PATH
export LD_LIBRARY_PATH=$(pwd)/build/install/lib
export SYCL_ROOT=$(pwd)/build/install/
export CMPLR_ROOT=$(pwd)/build/install/

Build pti-gpu:

git clone https://github.com/intel/pti-gpu.git
cd pti-gpu/sdk
mkdir _build && cd _build
cmake -DCMAKE_INSTALL_PREFIX=$llvm/build/install -DPTI_BUILD_TESTING=OFF -DPTI_BUILD_SAMPLES=OFF -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchains/icpx_toolchain.cmake ..
make && make install

Build pytorch. This will reproduce the issue:

git clone -b intel-llvm https://github.com/dvrogozh/pytorch.git
git submodule update --init --recursive
pip3 install -r requirements.txt
python3 setup.py develop

Once above steps to build pytorch are done, it's possible to run these 2 commands to reproduce the issue (they require some generated files from overall pytorch build and can't be run beforehand). Note that compilation step uses -fsycl-host-compiler=g++ - that's a way pytorch xpu is being built in general.

# compile step
/home/dvrogozh/git/install/bin/icpx -MD -MF deps.o.SYCL-depend -c /home/dvrogozh/git/pytorch/pytorch-clang/third_party/torch-xpu-ops/src/ATen/native/xpu/sycl/ReduceArgMaxKernel.cpp -o out.o -I/home/dvrogozh/git/install/include -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/pytorch/pytorch-clang/build/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/build -I/home/dvrogozh/git/pytorch/pytorch-clang -I/home/dvrogozh/git/pytorch/pytorch-clang/build/third_party/gloo -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/gloo -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/tensorpipe/third_party/libuv/include -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/googletest/googlemock/include -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/googletest/googletest/include -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/protobuf/src -I/opt/intel/oneapi/mkl/latest/include -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/XNNPACK/include -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/benchmark/include -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ittapi/include -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/eigen -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/onnx -I/home/dvrogozh/git/pytorch/pytorch-clang/build/third_party/onnx -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/include -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/opt/intel/oneapi/mkl/latest/include -I/home/dvrogozh/git/pytorch/pytorch-clang/nlohmann -I/home/dvrogozh/git/pytorch/pytorch-clang/INTERFACE -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/nlohmann/include -I/home/dvrogozh/git/pytorch/pytorch-clang/torch/csrc/api -I/home/dvrogozh/git/pytorch/pytorch-clang/torch/csrc/api/include -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/build/caffe2/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/build/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/.. -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/miniz-2.1.0 -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/native/mkldnn/xpu -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/native/mkldnn/xpu/detail -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/mkl-dnn/include -I/home/dvrogozh/git/pytorch/pytorch-clang/build/xpu_mkldnn_proj-prefix/src/xpu_mkldnn_proj-build/include -I/home/dvrogozh/git/install/include -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/xpu -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/native/mkldnn/xpu -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/native/mkldnn/xpu/detail -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/mkl-dnn/include -I/home/dvrogozh/git/pytorch/pytorch-clang/build/xpu_mkldnn_proj-prefix/src/xpu_mkldnn_proj-build/include -I/home/dvrogozh/git/install/include -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/xpu -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/torch-xpu-ops/src -I/home/dvrogozh/git/install/include -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/install/include/sycl -fsycl-host-compiler=/usr/bin/c++ "-fsycl-host-compiler-options=-I/home/dvrogozh/git/install/include -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/pytorch/pytorch-clang/build/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/build -I/home/dvrogozh/git/pytorch/pytorch-clang -I/home/dvrogozh/git/pytorch/pytorch-clang/build/third_party/gloo -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/gloo -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/tensorpipe/third_party/libuv/include -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/googletest/googlemock/include -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/googletest/googletest/include -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/protobuf/src -I/opt/intel/oneapi/mkl/latest/include -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/XNNPACK/include -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/benchmark/include -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ittapi/include -I/home/dvrogozh/git/pytorch/pytorch-clang/cmake/../third_party/eigen -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/onnx -I/home/dvrogozh/git/pytorch/pytorch-clang/build/third_party/onnx -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/include -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/mkl-dnn/include/oneapi/dnnl -I/opt/intel/oneapi/mkl/latest/include -I/home/dvrogozh/git/pytorch/pytorch-clang/nlohmann -I/home/dvrogozh/git/pytorch/pytorch-clang/INTERFACE -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/nlohmann/include -I/home/dvrogozh/git/pytorch/pytorch-clang/torch/csrc/api -I/home/dvrogozh/git/pytorch/pytorch-clang/torch/csrc/api/include -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/build/caffe2/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/build/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/.. -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/miniz-2.1.0 -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/native/mkldnn/xpu -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/native/mkldnn/xpu/detail -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/mkl-dnn/include -I/home/dvrogozh/git/pytorch/pytorch-clang/build/xpu_mkldnn_proj-prefix/src/xpu_mkldnn_proj-build/include -I/home/dvrogozh/git/install/include -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/xpu -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/native/mkldnn/xpu -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/native/mkldnn/xpu/detail -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/ideep/mkl-dnn/include -I/home/dvrogozh/git/pytorch/pytorch-clang/build/xpu_mkldnn_proj-prefix/src/xpu_mkldnn_proj-build/include -I/home/dvrogozh/git/install/include -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/pytorch/pytorch-clang/aten/src/ATen/xpu -I/home/dvrogozh/git/pytorch/pytorch-clang/third_party/torch-xpu-ops/src -I/home/dvrogozh/git/install/include -I/home/dvrogozh/git/install/include/sycl -I/home/dvrogozh/git/install/include/sycl -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -D__INTEL_PREVIEW_BREAKING_CHANGES -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=OFF -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DUSE_XPU -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -std=c++17 -Wno-deprecated-declarations -Wno-deprecated -Wno-attributes -Wno-sign-compare -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx_torch -DIDEEP_USE_MKL -DHAVE_MMAP=1 -D_FILE_OFFSET_BITS=64 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DFLASHATTENTION_DISABLE_ALIBI " -fsycl -fpreview-breaking-changes -fsycl-targets=spir64_gen,spir64 -fno-sycl-unnamed-lambda -sycl-std=2020 -fhonor-nans -fhonor-infinities -fno-associative-math -fno-approx-func -Wno-absolute-value -D__INTEL_PREVIEW_BREAKING_CHANGES -D_GLIBCXX_USE_CXX11_ABI=1 -fsycl-fp64-conv-emu -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx_torch -DIDEEP_USE_MKL -DHAVE_MMAP=1 -D_FILE_OFFSET_BITS=64 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DFLASHATTENTION_DISABLE_ALIBI

# device link step
icpx -fPIC -fsycl -fpreview-breaking-changes -fsycl-targets=spir64_gen,spir64 -fno-sycl-unnamed-lambda -sycl-std=2020 -fhonor-nans -fhonor-infinities -fno-associative-math -fno-approx-func -Wno-absolute-value -D__INTEL_PREVIEW_BREAKING_CHANGES -D_GLIBCXX_USE_CXX11_ABI=1 -fsycl-fp64-conv-emu -fsycl-max-parallel-link-jobs=208 -fsycl-targets=spir64_gen,spir64 -fsycl-link out.o -Xs -device\ pvc\ -options\ '\ -cl-poison-unsupported-fp64-kernels\ -cl-intel-enable-auto-large-GRF-mode\ -cl-fp32-correctly-rounded-divide-sqrt' -o a.o

Observations:

-device pvc works fine, but -device xe-lpg fails with segfault

cc: @mdtoguchi, @paigeale

dvrogozh commented 2 months ago

Driver stack versions on my side:

$ apt-cache show libigc1 | grep Version | head -1
Version: 1.0.17193.16-950~22.04

$ apt-cache show libigdfcl1 | grep Version | head -1
Version: 1.0.17193.16-950~22.04

$ apt-cache show intel-opencl-icd | grep Version | head -1
Version: 24.26.30049.10-950~22.04

$ apt-cache show level-zero | grep Version | head -1
Version: 1.16.15-881~22.04

$ apt-cache show intel-level-zero-gpu | grep Version | head -1
Version: 1.3.30049.10-950~22.04

mdtoguchi commented 2 months ago

A couple of items that should be noted here that allows for reproduction of the issue using a recent intel/llvm based compiler

icpx here is not the DPC++ compiler, but rather a symlink to using clang++
Currently reproduces with the intel/llvm compiler and not the DPC++ compiler
Compilation requires the use of -fsycl-host-compiler=g++, as the host code will not compile with clang.

dvrogozh commented 2 months ago

Debug observations on my side:

As you can see generated kernel is a "switch" kernel where multiple data types are handled. So, issue happens on the handling of only uint8_t type. I.e. on a call to argmax_kernel_impl<uint8_t>(iter).
I don't see the issue after forcing noinline for one of the functions used in the kernel definition. See below

Patch:

--- a/src/ATen/native/xpu/sycl/SharedReduceOps.h
+++ b/src/ATen/native/xpu/sycl/SharedReduceOps.h
@@ -349,6 +349,7 @@ struct MinMaxReductionOps {
     return comp_t{}(a.first, b.first, a.second, b.second) ? a : b;
   }

+  __attribute__((noinline))
   static arg_t translate_idx(arg_t a, int64_t base_idx) {
     return {a.first, a.second + base_idx};
   }

This function: https://github.com/intel/torch-xpu-ops/blob/13955ba5c9116ee5085fb0e4840aabe3d8f2fab4/src/ATen/native/xpu/sycl/SharedReduceOps.h#L352

Called from: https://github.com/intel/torch-xpu-ops/blob/13955ba5c9116ee5085fb0e4840aabe3d8f2fab4/src/ATen/native/xpu/sycl/Reduce.h#L999

bader commented 2 months ago

According to the call stack, the crash happens in IGC compiler, which is being developed in https://github.com/intel/intel-graphics-compiler/. @dvrogozh, did you report this issue to the IGC team?

dvrogozh commented 2 months ago

@dvrogozh, did you report this issue to the IGC team?

No. That's up to intel/llvm team to do so. However, I am talking to IGC team right now and update if there will be any findings.

dvrogozh commented 2 months ago

@paigeale from IGC team has helped to debug the issue and create IGC-level reproducer. This seems to be IGC side bug, so I have filed https://github.com/intel/intel-graphics-compiler/issues/340.

bader commented 2 months ago

Great. @dvrogozh, I propose that we close this issue and monitor the progress through the IGC issue. Does that plan work for you?

dvrogozh commented 2 months ago

Let's wait couple days to see IGC issue processed. I hope to get PR with the fix from them.

dvrogozh commented 1 month ago

Fixed by https://github.com/intel/intel-graphics-compiler/commit/66d001e52c8e496f51c2572acc2377ca8f4e9e50

intel / llvm

Segfault building pytorch sycl kernels with intel/llvm: binary Instruction seen with illegal int type #15082

Easier reproducer

Full reproducer