llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.84k stars 11.47k forks source link

Segmentation fault in LookupBucketFor with a SYCL parallel reduction using pocl CUDA device #47864

Open fe221a9f-9db9-44e9-9909-246d58be1e1f opened 3 years ago

fe221a9f-9db9-44e9-9909-246d58be1e1f commented 3 years ago
Bugzilla Link 48520
Version 11.0
OS Linux

Extended Description

See also pocl issue #​895 at https://github.com/pocl/pocl/issues/895

I am getting a segmentation fault in LLVM 11 (reproduced also in LLVM 10) from Debian unstable when using pocl's CUDA device with an NVIDIA GPU to run a simple parallel reduction implemented in SYCL and compiled with CodePlay's ComputeCpp CE 2.3

The backtrace is the following:

​0 0x00007fffe9644636 in LookupBucketFor<llvm::Type const*> () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/include/llvm/ADT/DenseMap.h:613

​1 find () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/include/llvm/ADT/DenseMap.h:161

​2 getTypeID () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/ValueEnumerator.h:166

​3 writeConstants () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:2545

​4 0x00007fffe9632622 in writeModuleConstants () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:2568

​5 write () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:4298

​6 0x00007fffe962dd3c in writeModule () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:4494

​7 0x00007fffe963803b in WriteBitcodeToFile () at /build/llvm-toolchain-11-eHqKZY/llvm-toolchain-11-11.0.0/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp:4520

​8 0x00007fffdfeecf0c in pocl_write_module (module=0x7ffff0002670, path=0x7ffff7566ff0 "/home/oblomov/.cache/pocl/kcache/DD/ACFLBNIKIOELDBLLGLMICHPFMGGGHOPMOFLBA/SYCL_struct_vecinit/1024-1-1-goffs0-smallgrid/parallel.bc",

dont_rewrite=0) at /home/oblomov/src/opencl/pocl/lib/llvmopencl/LLVMFileUtils.cc:421

​9 0x00007fffdfebe09b in pocl_cache_write_kernel_parallel_bc (bc=0x7ffff0002670, program=0x43b2a0, device_i=0, kernel=0x1220780, command=0x7ffff00020c0, specialize=1) at /home/oblomov/src/opencl/pocl/lib/CL/pocl_cache.c:359

​10 0x00007fffdff4aa62 in pocl_llvm_generate_workgroup_function (DeviceI=0, Device=0x779078, Kernel=0x1220780, Command=0x7ffff00020c0, Specialize=1) at /home/oblomov/src/opencl/pocl/lib/CL/pocl_llvm_wg.cc:485

​11 0x00007fffdc0a0a66 in load_or_generate_kernel (kernel=0x1220780, device=0x779078, has_offsets=0, device_i=0, command=0x7ffff00020c0, specialized=1) at /home/oblomov/src/opencl/pocl/lib/CL/devices/cuda/pocl-cuda.c:827

​12 0x00007fffdc0a104e in pocl_cuda_submit_kernel (stream=0xdc5a20, cmd=0x7ffff00020c0, device=0x779078, event=0x7ffff00021a0) at /home/oblomov/src/opencl/pocl/lib/CL/devices/cuda/pocl-cuda.c:913

​13 0x00007fffdc0a2207 in pocl_cuda_submit_node (node=0x7ffff00020c0, cq=0x438f20, locked=1) at /home/oblomov/src/opencl/pocl/lib/CL/devices/cuda/pocl-cuda.c:1256

​14 0x00007fffdc0a2530 in pocl_cuda_submit (node=0x7ffff00020c0, cq=0x438f20) at /home/oblomov/src/opencl/pocl/lib/CL/devices/cuda/pocl-cuda.c:1320

​15 0x00007fffdfea8c84 in pocl_command_enqueue (command_queue=0x438f20, node=0x7ffff00020c0) at /home/oblomov/src/opencl/pocl/lib/CL/pocl_util.c:576

​16 0x00007fffdfe89047 in POclEnqueueNDRangeKernel (command_queue=0x438f20, kernel=0x1220780, work_dim=1, global_work_offset=0x7ffff7568b10, global_work_size=0x7ffff7568af0, local_work_size=0x0, num_events_in_wait_list=0,

event_wait_list=0x0, event=0x7ffff7568ad8) at /home/oblomov/src/opencl/pocl/lib/CL/clEnqueueNDRangeKernel.c:369

​17 0x00007ffff7c94ccb in cl::sycl::detail::queue::enqueue_nd_range(std::shared_ptr const&, cl::sycl::detail::nd_range_base const&, int, std::vector<_cl_event, std::allocator<_cl_event> > const&) ()

from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​18 0x00007ffff7c7958b in cl::sycl::detail::kernel_container::run(std::shared_ptr const&, cl::sycl::detail::event_list*, bool) () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​19 0x00007ffff7c3b154 in cl::sycl::detail::command::run_kernel(std::shared_ptr const&, cl::sycl::detail::event_list*, bool) () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​20 0x00007ffff7ca4c60 in cl::sycl::detail::transaction::commit(std::shared_ptr const&) () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​21 0x00007ffff7ca7f63 in cl::sycl::detail::transaction::state<(cl::sycl::detail::trans_detail::status_t)3>::enter() () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​22 0x00007ffff7ca68d1 in cl::sycl::detail::transaction::change_to(cl::sycl::detail::trans_detail::status_t, cl::sycl::detail::trans_detail::status_t) () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​23 0x00007ffff7ca6feb in cl::sycl::detail::transaction::next_state() () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​24 0x00007ffff7c984d9 in cl::sycl::detail::scheduler::execute_front() () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​25 0x00007ffff7c988e1 in cl::sycl::detail::scheduler::scheduler_loop() () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​26 0x00007ffff7ef2120 in execute_native_thread_routine () from /opt/codeplay/computecpp/bin/../lib/libComputeCpp.so

​27 0x00007ffff7580ea7 in start_thread (arg=) at pthread_create.c:477

​28 0x00007ffff769fd8f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

fe221a9f-9db9-44e9-9909-246d58be1e1f commented 3 years ago

Some additional information:

I do understand that it's possible that the issue may be in the SPIR produced by the CodePlay compiler, but even if this is the case, I don't think the LLVM library should crash instead of throwing an appropriate error (assuming, of course, that the SPIR anomaly can be caught).