Combined Intel+NVIDIA+AMD Build Crashes Compiler (2024.03.13.)

intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

Other

1.26k stars 743 forks source link

Describe the bug

Recently, with help from @fwyzard and Aurora (I don't know her GitHub username unfortunately... :frowning:), we did some experiments with building hybrid Intel+NVIDIA+AMD binaries using the oneAPI compiler. The winning incantation being of the following kind:

icpx -fsycl \
   -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 \
   -Xclang -opaque-pointers \
   -Wno-unknown-cuda-version

This works well for "simple enough" source code. :wink: But similar to #8065, once I try to build our "most complicated" project, things break. :frowning: However, with the current latest version of oneAPI (2024.0.1), the failure is different than what I described in #8065. During the build I get:

...
[ 88%] Building SYCL object device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o
cd /data/ssd-1tb/projects/traccc/build/device/sycl && /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/icpx -x c++ -DALGEBRA_PLUGINS_INCLUDE_ARRAY -DCOVFIE_QUIET -DDETRAY_ALGEBRA_ARRAY -DDETRAY_ALGEBRA_EIGEN -DDETRAY_ALGEBRA_VC -DDETRAY_CUSTOM_SCALARTYPE=float -DEIGEN_NO_CUDA -DEIGEN_NO_HIP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTRACCC_CUSTOM_SCALARTYPE=float -DVECMEM_DEBUG_MSG_LVL=0 -DVECMEM_HAVE_PMR_MEMORY_RESOURCE -DVECMEM_HAVE_SYCL_ATOMIC_REF -DVECMEM_HAVE_SYCL_LOCAL_ACCESSOR -DVECMEM_SOURCE_DIR_LENGTH=37 -DVECMEM_SYCL_PRINTF_FUNCTION=cl::sycl::ext::oneapi::experimental::printf -Dtraccc_sycl_EXPORTS -I/data/ssd-1tb/projects/traccc/traccc/device/sycl/include -I/data/ssd-1tb/projects/traccc/traccc/core/include -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-build/core/CMakeFiles -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-src/core/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/core/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-build/core/CMakeFiles -I/data/ssd-1tb/projects/traccc/build/_deps/thrust-src/thrust/cmake/../.. -I/data/ssd-1tb/projects/traccc/build/_deps/thrust-src/dependencies/libcudacxx/include -I/data/ssd-1tb/projects/traccc/build/_deps/thrust-src/dependencies/cub/cub/cmake/../.. -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/array/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/array_cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/common/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/array/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/common/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/array/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/vecmem/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vecmem_cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/vecmem/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/eigen_eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/eigen/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vc_vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vc_cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/utils/include -I/data/ssd-1tb/projects/traccc/build/_deps/covfie-src/lib/core -I/data/ssd-1tb/projects/traccc/build/_deps/detray-src/io/include -I/data/ssd-1tb/projects/traccc/build/_deps/nlohmann_json-src/include -I/data/ssd-1tb/projects/traccc/traccc/device/common/include -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-build/sycl/CMakeFiles -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-src/sycl/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/eigen3-src -isystem /data/ssd-1tb/projects/traccc/build/_deps/vc-src -fsycl  -Wno-unknown-cuda-version -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 -Xclang -opaque-pointers -std=c++17 -O3 -Wall -Wextra -Wno-unknown-cuda-version -Wshadow -Wunused-local-typedefs -pedantic -fPIC -MD -MT device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o -MF CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o.d -o CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o -c /data/ssd-1tb/projects/traccc/traccc/device/sycl/src/fitting/fitting_algorithm.sycl
fatal error: error in backend: SmallVector unable to grow. Requested capacity (97325075862528) is larger than maximum value for size type (4294967295)
icpx: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler
Configuration file: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s).
make[3]: *** [device/sycl/CMakeFiles/traccc_sycl.dir/build.make:88: device/sycl/CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o] Error 1

To reproduce

As in the other ticket, reproducing this "from scratch" is a bit of an involved process unfortunately. :frowning: But since the problem is during compilation in this case, I can "simply" give you the preprocessed sources. :smile: You can download that file (it's pretty big) from:

https://cernbox.cern.ch/s/Dc83iva0Dl2czCF

With that file downloaded, you can reproduce this issue like:

icpx -x c++-cpp-output -fsycl  -Wno-unknown-cuda-version -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 -Xclang -opaque-pointers -std=c++17 -O3 -Wno-unknown-cuda-version -fPIC -c -o fitting_algorithm.sycl.o fitting_algorithm.sycl.i

The thing is, it doesn't always crash. :thinking: Just now, I was able to execute:

[bash][Legolas]:fitting > /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/icpx -x c++-cpp-output -fsycl  -Wno-unknown-cuda-version -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 -Xclang -opaque-pointers -std=c++17 -O3 -Wno-unknown-cuda-version -fPIC -c -o fitting_algorithm.sycl.o fitting_algorithm.sycl.i
[bash][Legolas]:fitting >

While a little earlier the preprocessed file made my compilation crash as expected. :confused:

[bash][Legolas]:sycl > /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/icpx -x c++-cpp-output -fsycl  -Wno-unknown-cuda-version -fsycl-targets=spir64,spir64_x86_64,nvidia_gpu_sm_86,amd_gpu_gfx803 -Xclang -opaque-pointers -std=c++17 -O3 -Wno-unknown-cuda-version -fPIC -c -o CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.o CMakeFiles/traccc_sycl.dir/src/fitting/fitting_algorithm.sycl.i 2>&1 | tee error.log
fatal error: error in backend: SmallVector unable to grow. Requested capacity (109556080997376) is larger than maximum value for size type (4294967295)
icpx: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler
Configuration file: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
[bash][Legolas]:sycl >

(I was in a different subdirectory on that try...)

Environment

OS: Linux
Compiler version:

Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler
Configuration file: /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/bin/compiler/../icpx.cfg
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /home/krasznaa/software/nvidia/cuda-12.4.0/x86_64, version 
Found HIP installation: /opt/rocm-6.0.2, version 6.0.32831

Don't be fooled by what the compiler reports for HIP though. :thinking: The oneAPI plugin is set up to use ROCm-5.4.6.

[bash][Legolas]:build > ldd -r /home/krasznaa/software/intel/oneapi-2024.0.1/compiler/2024.0/lib/libpi_hip.so 
    linux-vdso.so.1 (0x00007ffff9462000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f8f0d103000)
    libamdhip64.so.5 => /opt/rocm/lib/libamdhip64.so.5 (0x00007f8f0b600000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f8f0b200000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8f0ae00000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8f0d174000)
    libhsa-runtime64.so.1 => /opt/rocm/lib/libhsa-runtime64.so.1 (0x00007f8f0aa00000)
    libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f8f0d0f4000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8f0d00d000)
    libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1 (0x00007f8f0cfed000)
    libdrm.so.2 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f8f0cfd3000)
    libdrm_amdgpu.so.1 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1 (0x00007f8f0cfc5000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f8f0cfa9000)
[bash][Legolas]:build > ls -l /opt/rocm
lrwxrwxrwx 1 root root 22 Sep  5  2023 /opt/rocm -> /etc/alternatives/rocm
[bash][Legolas]:build > ls -l /etc/alternatives/rocm
lrwxrwxrwx 1 root root 15 Sep 30 14:11 /etc/alternatives/rocm -> /opt/rocm-5.4.6
[bash][Legolas]:build >

I just happen to do a native HIP build with version 6.0.2 at the same time...

Additional context

The build for "just" an Intel+NVIDIA backend does work. Only once I try to build for all 3 in parallel, that I see this issue pop up. :thinking:

LLVM ERROR: SmallVector unable to grow. Requested capacity (94570172698048) is larger than maximum value for size type (4294967295) PLEASE submit a bug report to https://software.intel.com/en-us/support/priority-support and include the crash backtrace. Stack dump: 0. Program arguments: /opt/intel/oneapi/compiler/2024.0/bin/compiler/lld -flavor gnu -m elf64_amdgpu --no-undefined -shared -plugin-opt=-amdgpu-internalize-symbols -plugin-opt=mcpu=gfx900 -plugin-opt=O2 --lto-CGO2 -plugin-opt=-vector-library=SVML -plugin-opt=fintel-libirc-allowed -plugin-opt=-disable-hir-generate-mkl-call -plugin-opt=-loopopt=1 -plugin-opt=-intel-abi-compatible=true -plugin-opt=-x86-enable-unaligned-vector-move=true --whole-archive -o /tmp/BrokenLineFitOnGPU-gfx900-bbc203-de42e9.out /tmp/BrokenLineFitOnGPU-gfx900-9738be-475de6.o --no-whole-archive #0 0x00005602d75bd443 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2d62443) #1 0x00005602d75bb930 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2d60930) #2 0x00005602d75bdd74 SignalHandler(int) Signals.cpp:0:0 #3 0x00007fef72e54db0 __restore_rt (/lib64/libc.so.6+0x54db0) #4 0x00007fef72ea154c __pthread_kill_implementation (/lib64/libc.so.6+0xa154c) #5 0x00007fef72e54d06 gsignal (/lib64/libc.so.6+0x54d06) #6 0x00007fef72e287f3 abort (/lib64/libc.so.6+0x287f3) #7 0x00005602d7557062 llvm::report_fatal_error(llvm::Twine const&, bool) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2cfc062) #8 0x00005602d755bde0 report_at_maximum_capacity(unsigned long) SmallVector.cpp:0:0 #9 0x00005602d755ba48 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2d00a48) #10 0x00005602d794bd17 llvm::SmallVector<int, 64u>::SmallVector<int, void>(llvm::ArrayRef<int>) X86ISelLowering.cpp:0:0 #11 0x00005602d7951c6f lowerShuffleAsBlend(llvm::SDLoc const&, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, llvm::APInt const&, llvm::X86Subtarget const&, llvm::SelectionDAG&) X86ISelLowering.cpp:0:0 #12 0x00005602d794dd48 lowerV4F32Shuffle(llvm::SDLoc const&, llvm::ArrayRef<int>, llvm::APInt const&, llvm::SDValue, llvm::SDValue, llvm::X86Subtarget const&, llvm::SelectionDAG&) X86ISelLowering.cpp:0:0 #13 0x00005602d78c0bf8 llvm::X86TargetLowering::LowerOperation(llvm::SDValue, llvm::SelectionDAG&) const X86ISelLowering.cpp:0:0 #14 0x00005602d9bd0081 llvm::DataLayout::getABITypeAlign(llvm::Type*) const (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x5375081) #15 0x00005602d877d6b3 llvm::vpo::VPlanTTICostModel::getMemInstAlignment(llvm::vpo::VPLoadStoreInst const*) const (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3f226b3) #16 0x00005602d877f75f llvm::vpo::VPlanTTICostModel::getTTICostForVF(llvm::vpo::VPInstruction const*, unsigned int) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3f2475f) #17 0x00005602d877f5bc llvm::vpo::VPlanTTICostModel::getTTICost(llvm::vpo::VPInstruction const*) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3f245bc) #18 0x00005602d87094e0 llvm::vpo::VPlanCostModelWithHeuristics<llvm::vpo::HeuristicsList<llvm::vpo::VPInstruction const>, llvm::vpo::HeuristicsList<llvm::vpo::VPBasicBlock const>, llvm::vpo::HeuristicsList<llvm::vpo::VPlanVector const, llvm::vpo::VPlanCostModelHeuristics::HeuristicSLP, llvm::vpo::VPlanCostModelHeuristics::HeuristicGatherScatter, llvm::vpo::VPlanCostModelHeuristics::HeuristicSpillFill, llvm::vpo::VPlanCostModelHeuristics::HeuristicPsadbw, llvm::vpo::VPlanCostModelHeuristics::HeuristicUnroll>>::getCostImpl(llvm::vpo::VPBasicBlock const*, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0 #19 0x00005602d8709716 llvm::vpo::VPInstructionCost llvm::vpo::VPlanCostModelWithHeuristics<llvm::vpo::HeuristicsList<llvm::vpo::VPInstruction const>, llvm::vpo::HeuristicsList<llvm::vpo::VPBasicBlock const>, llvm::vpo::HeuristicsList<llvm::vpo::VPlanVector const, llvm::vpo::VPlanCostModelHeuristics::HeuristicSLP, llvm::vpo::VPlanCostModelHeuristics::HeuristicGatherScatter, llvm::vpo::VPlanCostModelHeuristics::HeuristicSpillFill, llvm::vpo::VPlanCostModelHeuristics::HeuristicPsadbw, llvm::vpo::VPlanCostModelHeuristics::HeuristicUnroll>>::getRangeCost<llvm::iterator_range<llvm::vpo::sese_df_iterator<llvm::vpo::VPBasicBlock*>>>(llvm::iterator_range<llvm::vpo::sese_df_iterator<llvm::vpo::VPBasicBlock*>>, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0 #20 0x00005602d8708ca7 llvm::vpo::VPlanCostModelWithHeuristics<llvm::vpo::HeuristicsList<llvm::vpo::VPInstruction const>, llvm::vpo::HeuristicsList<llvm::vpo::VPBasicBlock const>, llvm::vpo::HeuristicsList<llvm::vpo::VPlanVector const, llvm::vpo::VPlanCostModelHeuristics::HeuristicSLP, llvm::vpo::VPlanCostModelHeuristics::HeuristicGatherScatter, llvm::vpo::VPlanCostModelHeuristics::HeuristicSpillFill, llvm::vpo::VPlanCostModelHeuristics::HeuristicPsadbw, llvm::vpo::VPlanCostModelHeuristics::HeuristicUnroll>>::getCost(bool, llvm::vpo::VPlanPeelingVariant*, llvm::raw_ostream*) IntelLoopVectorizationPlanner.cpp:0:0 #21 0x00005602d86fc377 llvm::vpo::LoopVectorizationPlanner::selectBestPlan() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3ea1377) #22 0x00005602d86e9980 llvm::vpo::VPlanDriverHIRImpl::processLoop(llvm::loopopt::HLLoop*, llvm::Function&, llvm::vpo::WRNVecLoopNode*) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3e8e980) #23 0x00005602d86ec883 llvm::vpo::VPlanDriverHIRImpl::runImpl(llvm::Function&, llvm::loopopt::HIRFramework*, llvm::loopopt::HIRLoopStatistics*, llvm::loopopt::HIRDDAnalysis*, llvm::loopopt::HIRSafeReductionAnalysis*, llvm::OptReportVerbosity::Level, llvm::vpo::WRegionInfo*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AssumptionCache*, llvm::DominatorTree*) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3e91883) #24 0x00005602d86ec627 llvm::vpo::VPlanDriverHIRPass::runImpl(llvm::Function&, llvm::AnalysisManager<llvm::Function>&, llvm::loopopt::HIRFramework&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x3e91627) #25 0x00005602d83219e5 llvm::detail::PassModel<llvm::Function, llvm::vpo::VPlanDriverHIRPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0 #26 0x00005602d9c92d63 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x5437d63) #27 0x00005602d7ba304d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) NVPTXTargetMachine.cpp:0:0 #28 0x00005602d9c996f2 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x543e6f2) #29 0x00005602d7ba2ddd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) NVPTXTargetMachine.cpp:0:0 #30 0x00005602d9c91d63 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x5436d63) #31 0x00005602d8253b10 runNewPMPasses(llvm::lto::Config const&, llvm::Module&, llvm::TargetMachine*, unsigned int, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*) LTOBackend.cpp:0:0 #32 0x00005602d82534bc llvm::lto::opt(llvm::lto::Config const&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x39f84bc) #33 0x00005602d825456c llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x39f956c) #34 0x00005602d8245eed llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x39eaeed) #35 0x00005602d8245610 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::function<llvm::Expected<std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>> (unsigned int, llvm::StringRef, llvm::Twine const&)>) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x39ea610) #36 0x00005602d7739a45 lld::elf::BitcodeCompiler::compile() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2edea45) #37 0x00005602d769c7c1 lld::elf::LinkerDriver::link(llvm::opt::InputArgList&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2e417c1) #38 0x00005602d768f5ec lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2e345ec) #39 0x00005602d768d908 lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2e32908) #40 0x00005602d75bf820 lld::unsafeLldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>, bool) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2d64820) #41 0x00005602d7543226 lld_main(int, char**, llvm::ToolContext const&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2ce8226) #42 0x00005602d754386e main (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2ce886e) #43 0x00007fef72e3feb0 __libc_start_call_main (/lib64/libc.so.6+0x3feb0) #44 0x00007fef72e3ff60 __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3ff60) #45 0x00005602d7542ee9 _start (/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld+0x2ce7ee9) llvm-foreach: Aborted (core dumped) icpx: fatal error: amdgcn-link command failed with exit code 254 (use -v to see invocation) Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/intel/oneapi/compiler/2024.0/bin/compiler Configuration file: /opt/intel/oneapi/compiler/2024.0/bin/compiler/../icpx.cfg

/opt/intel/oneapi/compiler/2024.0/bin/compiler/lld -flavor gnu -m elf64_amdgpu --no-undefined -shared -plugin-opt=-amdgpu-internalize-symbols -plugin-opt=mcpu=gfx900 -plugin-opt=O2 --lto-CGO2 -plugin-opt=-vector-library=SVML -plugin-opt=fintel-libirc-allowed -plugin-opt=-disable-hir-generate-mkl-call -plugin-opt=-loopopt=1 -plugin-opt=-intel-abi-compatible=true -plugin-opt=-x86-enable-unaligned-vector-move=true --whole-archive -o BrokenLineFitOnGPU-gfx900-bbc203-de42e9.out BrokenLineFitOnGPU-gfx900-9738be-475de6.o --no-whole-archive

intel / llvm