intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.26k stars 743 forks source link

Combined Intel+NVIDIA+AMD Build Crashes Compiler (2024.03.13.) #13010

Open krasznaa opened 8 months ago

krasznaa commented 8 months ago

Describe the bug

Recently, with help from @fwyzard and Aurora (I don't know her GitHub username unfortunately... :frowning:), we did some experiments with building hybrid Intel+NVIDIA+AMD binaries using the oneAPI compiler. The winning incantation being of the following kind:

icpx -fsycl \
   -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 \
   -Xclang -opaque-pointers \
   -Wno-unknown-cuda-version

This works well for "simple enough" source code. :wink: But similar to #8065, once I try to build our "most complicated" project, things break. :frowning: However, with the current latest version of oneAPI (2024.0.1), the failure is different than what I described in #8065. During the build I get:

...
[ 88%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o
cd /data/ssd-1tb/projects/traccc/build/device/sycl && /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/icpx -x c++ -DALGEBRA_PLUGINS_INCLUDE_ARRAY -DCOVFIE_QUIET -DDETRAY_ALGEBRA_ARRAY -DDETRAY_ALGEBRA_EIGEN -DDETRAY_ALGEBRA_VC -DDETRAY_CUSTOM_SCALARTYPE=float -DEIGEN_NO_CUDA -DEIGEN_NO_HIP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTRACCC_CUSTOM_SCALARTYPE=float -DVECMEM_DEBUG_MSG_LVL=0 -DVECMEM_HAVE_PMR_MEMORY_RESOURCE -DVECMEM_HAVE_SYCL_ATOMIC_REF -DVECMEM_HAVE_SYCL_LOCAL_ACCESSOR -DVECMEM_SOURCE_DIR_LENGTH=37 -DVECMEM_SYCL_PRINTF_FUNCTION=cl::sycl::ext::oneapi::experimental::printf -Dtraccc_sycl_EXPORTS -I/data/ssd-1tb/projects/traccc/traccc/device/sycl/include -I/data/ssd-1tb/projects/traccc/traccc/core/include -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-build/core/CMakeFiles -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-src/core/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/core/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-build/core/CMakeFiles -I/data/ssd-1tb/projects/traccc/build/_deps/thrust-src/thrust/cmake/../.. -I/data/ssd-1tb/projects/traccc/build/_deps/thrust-src/dependencies/libcudacxx/include -I/data/ssd-1tb/projects/traccc/build/_deps/thrust-src/dependencies/cub/cub/cmake/../.. -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/array/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/array_cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/common/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/array/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/common/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/array/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/vecmem/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vecmem_cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/vecmem/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/eigen_eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/eigen/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vc_vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vc_cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/utils/include -I/data/ssd-1tb/projects/traccc/build/_deps/covfie-src/lib/core -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/io/include -I/data/ssd-1tb/projects/traccc/build/_deps/nlohmann_json-src/include -I/data/ssd-1tb/projects/traccc/traccc/device/common/include -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-build/sycl/CMakeFiles -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-src/sycl/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/eigen3-src -isystem /data/ssd-1tb/projects/traccc/build/_deps/vc-src -fsycl  -Wno-unknown-cuda-version -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 -Xclang -opaque-pointers -std=c++17 -O3 -Wall -Wextra -Wno-unknown-cuda-version -Wshadow -Wunused-local-typedefs -pedantic -fPIC -MD -MT device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o -MF CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o.d -o CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o -c /data/ssd-1tb/projects/traccc/traccc/device/sycl/src/fitting/fitting_algorithm.sycl
fatal error: error in backend: SmallVector unable to grow. Requested capacity (97325075862528) is larger than maximum value for size type (4294967295)
icpx: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler
Configuration file: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s).
make[3]: *** [device/sycl/CMakeFiles/traccc_sycl.dir/build.make:88: device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o] Error 1

To reproduce

As in the other ticket, reproducing this "from scratch" is a bit of an involved process unfortunately. :frowning: But since the problem is during compilation in this case, I can "simply" give you the preprocessed sources. :smile: You can download that file (it's pretty big) from:

https://cernbox.cern.ch/s/Dc83iva0Dl2czCF

With that file downloaded, you can reproduce this issue like:

icpx -x c++-cpp-output -fsycl  -Wno-unknown-cuda-version -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 -Xclang -opaque-pointers -std=c++17 -O3 -Wno-unknown-cuda-version -fPIC -c -o fitting_algorithm.sycl.o fitting_algorithm.sycl.i

The thing is, it doesn't always crash. :thinking: Just now, I was able to execute:

[bash][Legolas]:fitting > /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/icpx -x c++-cpp-output -fsycl  -Wno-unknown-cuda-version -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 -Xclang -opaque-pointers -std=c++17 -O3 -Wno-unknown-cuda-version -fPIC -c -o fitting_algorithm.sycl.o fitting_algorithm.sycl.i
[bash][Legolas]:fitting >

While a little earlier the preprocessed file made my compilation crash as expected. :confused:

[bash][Legolas]:sycl > /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/icpx -x c++-cpp-output -fsycl  -Wno-unknown-cuda-version -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 -Xclang -opaque-pointers -std=c++17 -O3 -Wno-unknown-cuda-version -fPIC -c -o CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.i 2>&1 | tee error.log
fatal error: error in backend: SmallVector unable to grow. Requested capacity (109556080997376) is larger than maximum value for size type (4294967295)
icpx: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler
Configuration file: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
[bash][Legolas]:sycl >

(I was in a different subdirectory on that try...)

Environment

Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler
Configuration file: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler/../icpx.cfg
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /home/krasznaa/software/nvidia/cuda-12.4.0/x86_64, version 
Found HIP installation: /opt/rocm-6.0.2, version 6.0.32831

Don't be fooled by what the compiler reports for HIP though. :thinking: The oneAPI plugin is set up to use ROCm-5.4.6.

[bash][Legolas]:build > ldd -r /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/lib/libpi_hip.so 
    linux-vdso.so.1 (0x00007ffff9462000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f8f0d103000)
    libamdhip64.so.5 => /opt/rocm/lib/libamdhip64.so.5 (0x00007f8f0b600000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f8f0b200000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8f0ae00000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8f0d174000)
    libhsa-runtime64.so.1 => /opt/rocm/lib/libhsa-runtime64.so.1 (0x00007f8f0aa00000)
    libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f8f0d0f4000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8f0d00d000)
    libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1 (0x00007f8f0cfed000)
    libdrm.so.2 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f8f0cfd3000)
    libdrm_amdgpu.so.1 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1 (0x00007f8f0cfc5000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f8f0cfa9000)
[bash][Legolas]:build > ls -l /opt/rocm
lrwxrwxrwx 1 root root 22 Sep  5  2023 /opt/rocm -> /etc/alternatives/rocm
[bash][Legolas]:build > ls -l /etc/alternatives/rocm
lrwxrwxrwx 1 root root 15 Sep 30 14:11 /etc/alternatives/rocm -> /opt/rocm-5.4.6
[bash][Legolas]:build >

I just happen to do a native HIP build with version 6.0.2 at the same time...

Additional context

The build for "just" an Intel+NVIDIA backend does work. Only once I try to build for all 3 in parallel, that I see this issue pop up. :thinking:

AuroraPerego commented 8 months ago

I see the same behavior when compiling our project as well, even when compiling for AMD only.

LLVM ERROR: SmallVector unable to grow. Requested capacity (94570172698048) is larger than maximum value for size type (4294967295)
PLEASE submit a bug report to https://software.intel.com/en-us/support/priority-support and include the crash backtrace.
Stack dump:
0.      Program arguments: /opt/intel/oneapi/compiler/2024.0/bin/compiler/lld -flavor gnu -m elf64_amdgpu --no-undefined -shared -plugin-opt=-amdgpu-internalize-symbols -plugin-opt=mcpu=gfx900 -plugin-opt=O2 --lto-CGO2 -plugin-opt=-vector-library=SVML -plugin-opt=fintel-libirc-allowed -plugin-opt=-disable-hir-generate-mkl-call -plugin-opt=-loopopt=1 -plugin-opt=-intel-abi-compatible=true -plugin-opt=-x86-enable-unaligned-vector-move=true --whole-archive -o /tmp/BrokenLineFitOnGPU-gfx900-bbc203-de42e9.out /tmp/BrokenLineFitOnGPU-gfx900-9738be-475de6.o --no-whole-archive
 #0 0x00005602d75bd443 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2d62443)
 #1 0x00005602d75bb930 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2d60930)
 #2 0x00005602d75bdd74 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fef72e54db0 __restore_rt (/lib64/libc.so.6+0x54db0)
 #4 0x00007fef72ea154c __pthread_kill_implementation (/lib64/libc.so.6+0xa154c)
 #5 0x00007fef72e54d06 gsignal (/lib64/libc.so.6+0x54d06)
 #6 0x00007fef72e287f3 abort (/lib64/libc.so.6+0x287f3)
 #7 0x00005602d7557062 llvm::report_fatal_error(llvm::Twine const&, bool) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2cfc062)
 #8 0x00005602d755bde0 report_at_maximum_capacity(unsigned long) SmallVector.cpp:0:0
 #9 0x00005602d755ba48 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2d00a48)
#10 0x00005602d794bd17 llvm::SmallVector<int, 64u>::SmallVector<int, void>(llvm::ArrayRef<int>) X86ISelLowering.cpp:0:0
#11 0x00005602d7951c6f lowerShuffleAsBlend(llvm::SDLoc const&, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, llvm::APInt const&, llvm::X86Subtarget const&, llvm::SelectionDAG&) X86ISelLowering.cpp:0:0
#12 0x00005602d794dd48 lowerV4F32Shuffle(llvm::SDLoc const&, llvm::ArrayRef<int>, llvm::APInt const&, llvm::SDValue, llvm::SDValue, llvm::X86Subtarget const&, llvm::SelectionDAG&) X86ISelLowering.cpp:0:0
#13 0x00005602d78c0bf8 llvm::X86TargetLowering::LowerOperation(llvm::SDValue, llvm::SelectionDAG&) const X86ISelLowering.cpp:0:0
#14 0x00005602d9bd0081 llvm::DataLayout::getABITypeAlign(llvm::Type*) const (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x5375081)
#15 0x00005602d877d6b3 llvm::vpo::VPlanTTICostModel::getMemInstAlignment(llvm::vpo::VPLoadStoreInst const*) const (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3f226b3)
#16 0x00005602d877f75f llvm::vpo::VPlanTTICostModel::getTTICostForVF(llvm::vpo::VPInstruction const*, unsigned int) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3f2475f)
#17 0x00005602d877f5bc llvm::vpo::VPlanTTICostModel::getTTICost(llvm::vpo::VPInstruction const*) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3f245bc)
#18 0x00005602d87094e0 llvm::vpo::VPlanCostModelWithHeuristics<llvm::vpo::HeuristicsList<llvm::vpo::VPInstruction const>, llvm::vpo::HeuristicsList<llvm::vpo::VPBasicBlock const>, llvm::vpo::HeuristicsList<llvm::vpo::VPlanVector const, llvm::vpo::VPlanCostModelHeuristics::HeuristicSLP, llvm::vpo::VPlanCostModelHeuristics::HeuristicGatherScatter, llvm::vpo::VPlanCostModelHeuristics::HeuristicSpillFill, llvm::vpo::VPlanCostModelHeuristics::HeuristicPsadbw, llvm::vpo::VPlanCostModelHeuristics::HeuristicUnroll>>::getCostImpl(llvm::vpo::VPBasicBlock const*, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0
#19 0x00005602d8709716 llvm::vpo::VPInstructionCost llvm::vpo::VPlanCostModelWithHeuristics<llvm::vpo::HeuristicsList<llvm::vpo::VPInstruction const>, llvm::vpo::HeuristicsList<llvm::vpo::VPBasicBlock const>, llvm::vpo::HeuristicsList<llvm::vpo::VPlanVector const, llvm::vpo::VPlanCostModelHeuristics::HeuristicSLP, llvm::vpo::VPlanCostModelHeuristics::HeuristicGatherScatter, llvm::vpo::VPlanCostModelHeuristics::HeuristicSpillFill, llvm::vpo::VPlanCostModelHeuristics::HeuristicPsadbw, llvm::vpo::VPlanCostModelHeuristics::HeuristicUnroll>>::getRangeCost<llvm::iterator_range<llvm::vpo::sese_df_iterator<llvm::vpo::VPBasicBlock*>>>(llvm::iterator_range<llvm::vpo::sese_df_iterator<llvm::vpo::VPBasicBlock*>>, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0
#20 0x00005602d8708ca7 llvm::vpo::VPlanCostModelWithHeuristics<llvm::vpo::HeuristicsList<llvm::vpo::VPInstruction const>, llvm::vpo::HeuristicsList<llvm::vpo::VPBasicBlock const>, llvm::vpo::HeuristicsList<llvm::vpo::VPlanVector const, llvm::vpo::VPlanCostModelHeuristics::HeuristicSLP, llvm::vpo::VPlanCostModelHeuristics::HeuristicGatherScatter, llvm::vpo::VPlanCostModelHeuristics::HeuristicSpillFill, llvm::vpo::VPlanCostModelHeuristics::HeuristicPsadbw, llvm::vpo::VPlanCostModelHeuristics::HeuristicUnroll>>::getCost(bool, llvm::vpo::VPlanPeelingVariant*, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0
#21 0x00005602d86fc377 llvm::vpo::LoopVectorizationPlanner::selectBestPlan() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3ea1377)
#22 0x00005602d86e9980 llvm::vpo::VPlanDriverHIRImpl::processLoop(llvm::loopopt::HLLoop*, llvm::Function&, llvm::vpo::WRNVecLoopNode*) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3e8e980)
#23 0x00005602d86ec883 llvm::vpo::VPlanDriverHIRImpl::runImpl(llvm::Function&, llvm::loopopt::HIRFramework*, llvm::loopopt::HIRLoopStatistics*, llvm::loopopt::HIRDDAnalysis*, llvm::loopopt::HIRSafeReductionAnalysis*, llvm::OptReportVerbosity::Level, llvm::vpo::WRegionInfo*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AssumptionCache*, llvm::DominatorTree*) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3e91883)
#24 0x00005602d86ec627 llvm::vpo::VPlanDriverHIRPass::runImpl(llvm::Function&, llvm::AnalysisManager<llvm::Function>&, llvm::loopopt::HIRFramework&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3e91627)
#25 0x00005602d83219e5 llvm::detail::PassModel<llvm::Function, llvm::vpo::VPlanDriverHIRPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#26 0x00005602d9c92d63 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x5437d63)
#27 0x00005602d7ba304d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) NVPTXTargetMachine.cpp:0:0
#28 0x00005602d9c996f2 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x543e6f2)
#29 0x00005602d7ba2ddd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) NVPTXTargetMachine.cpp:0:0
#30 0x00005602d9c91d63 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x5436d63)
#31 0x00005602d8253b10 runNewPMPasses(llvm::lto::Config const&, llvm::Module&, llvm::TargetMachine*, unsigned int, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*) LTOBackend.cpp:0:0
#32 0x00005602d82534bc llvm::lto::opt(llvm::lto::Config const&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x39f84bc)
#33 0x00005602d825456c llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x39f956c)
#34 0x00005602d8245eed llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x39eaeed)
#35 0x00005602d8245610 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::function<llvm::Expected<std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>> (unsigned int, llvm::StringRef, llvm::Twine const&)>) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x39ea610)
#36 0x00005602d7739a45 lld::elf::BitcodeCompiler::compile() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2edea45)
#37 0x00005602d769c7c1 lld::elf::LinkerDriver::link(llvm::opt::InputArgList&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2e417c1)
#38 0x00005602d768f5ec lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2e345ec)
#39 0x00005602d768d908 lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2e32908)
#40 0x00005602d75bf820 lld::unsafeLldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>, bool) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2d64820)
#41 0x00005602d7543226 lld_main(int, char**, llvm::ToolContext const&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2ce8226)
#42 0x00005602d754386e main (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2ce886e)
#43 0x00007fef72e3feb0 __libc_start_call_main (/lib64/libc.so.6+0x3feb0)
#44 0x00007fef72e3ff60 __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3ff60)
#45 0x00005602d7542ee9 _start (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2ce7ee9)
llvm-foreach: Aborted (core dumped)
icpx: fatal error: amdgcn-link command failed with exit code 254 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2024.0/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2024.0/bin/compiler/../icpx.cfg

this is the file that causes the crash: cernbox and the command to reproduce it is:

/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld -flavor gnu -m elf64_amdgpu --no-undefined -shared -plugin-opt=-amdgpu-internalize-symbols -plugin-opt=mcpu=gfx900 -plugin-opt=O2 --lto-CGO2 -plugin-opt=-vector-library=SVML -plugin-opt=fintel-libirc-allowed -plugin-opt=-disable-hir-generate-mkl-call -plugin-opt=-loopopt=1 -plugin-opt=-intel-abi-compatible=true -plugin-opt=-x86-enable-unaligned-vector-move=true --whole-archive -o BrokenLineFitOnGPU-gfx900-bbc203-de42e9.out BrokenLineFitOnGPU-gfx900-9738be-475de6.o --no-whole-archive

To compile the entire project:

git clone -b sycl_bugfix https://github.com/AuroraPerego/pixeltrack-standalone.git
cd pixeltrack-standalone
make environment
. env.sh
make sycl
ashadrina commented 8 months ago

Thank you for raising this issue. I reproduced it and escalated it to the development team.

rafbiels commented 7 months ago

Hi @krasznaa and @AuroraPerego, this is an issue with a proprietary Intel loop optimisation (hence IntelLoopVectorizationPlanner appears in the stack trace) and it does not affect the open-source DPC++ (this repo). Nevertheless, we found a very similar issue which has been fixed by the compiler team for the next release and we believe this should also fix your issue.

In the meantime, you can use one of the following workarounds before the next release is available:

rafbiels commented 5 months ago

After testing the upcoming release candidate, I found that unfortunately the fix did not help and the release will be still affected by this bug. However, the problematic loop optimisation passes have been disabled for the AMD and NVIDIA backends for the next release after the upcoming one. This will resolve the problem, but due to the Intel oneAPI release cycle it's still some time away. In the meantime, please refer to the workarounds above.