llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.29k stars 11.68k forks source link

NVPTX cannot select dynamic_stackalloc with OpenMP and Eigen #64017

Open markdewing opened 1 year ago

markdewing commented 1 year ago

Compiling Eigen with OpenMP offload gives the following error

LLVM ERROR: Cannot select: t13: i64,ch = dynamic_stackalloc t4:1, t23, Constant:i64<0>
  t23: i64 = shl t4, Constant:i32<3>
    t4: i64,ch = load<(dereferenceable load (s64) from %ir.105)> t0, FrameIndex:i64<99>, undef:i64
      t1: i64 = FrameIndex<99>
      t3: i64 = undef
    t24: i32 = Constant<3>
  t2: i64 = Constant<0>
In function: _ZN5Eigen8internal23triangular_solve_matrixIdlLi1ELi2ELb0ELi0ELi0ELi1EE3runEllPKdlPdllRNS0_15level3_blockingIddEE
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /home/mdewing/prev/software/llvm/usr_main/bin/clang-linker-wrapper --cuda-path=/usr/local/cuda --host-triple=x86_64-unknown-linux-gnu --linker-path=/usr/bin/ld -- -pie -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o -L/home/mdewing/prev/software/llvm/usr_main/bin/../lib/x86_64-unknown-linux-gnu -L/home/mdewing/prev/software/llvm/usr_main/lib/clang/17/lib/x86_64-unknown-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/12 -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib -L/usr/lib /tmp/testEigenNoFit_cpu-ec64de.o -lstdc++ -lm -lomp -lomptarget -lomptarget.devicertl -L/home/mdewing/prev/software/llvm/usr_main/lib -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o /lib/x86_64-linux-gnu/crtn.o
1.  Running pass 'Function Pass Manager' on module 'ld-temp.o'.
2.  Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@_ZN5Eigen8internal23triangular_solve_matrixIdlLi1ELi2ELb0ELi0ELi0ELi1EE3runEllPKdlPdllRNS0_15level3_blockingIddEE'

The full stack trace is in the attached file (out.txt), along with a reproducer and a compile script. eigen_alloc_select.tar.gz

If EIGEN_ALLOCA is forcibly disabled (#undef EIGEN_ALLOCA after it is defined in Eigen/src/Core/Memory.h), the compilation gets further but has a different 'cannot select' error.

Artem-B commented 1 year ago

You may be the first person trying Eigen+OpenMP on a GPU. Typically folks use native GPU support in Eigen.

Alloca is indeed not supported by NVPTX. It's not been supported at all in the earlier versions of PTX.
PTX 7.3 in CUDA-11.3 did introduce alloca instruction, but so far there's been no need to implement it. Adding it is on my todo, but it's pretty far down the list.

For high-performance code use of the stack on GPU is quite often a showstopper, so if the code is intended for practical use, figuring out how to avoid stack use altogether may be worth the effort.

Artem-B commented 1 year ago

@jhuber6 FYI. Not sure how common this is/will be for OpenMP.

markdewing commented 1 year ago

This is part of a project to evaluate different portability frameworks for GPU's.