llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.11k stars 12.01k forks source link

backend(OpenMP target NVPTX): Cannot select: 0x6c60f50: i16,ch = AtomicCmpSwap #43564

Open 766e2662-641f-4db9-99f5-d4a4dc8c4704 opened 4 years ago

766e2662-641f-4db9-99f5-d4a4dc8c4704 commented 4 years ago
Bugzilla Link 44219
Version 9.0
OS Linux
Attachments Temporary files from clang++ (host: preprocessed source .ii, assembly .s; nvptx: preprocessed source .ii))
CC @hfinkel,@jdoerfert,@JonChesterfield,@Artem-B

Extended Description

Internal error trying to compile C++11 code with OpenMP target-offload for NVPTX backend, employing #omp atomic directives

Command-line: clang++ -DALPAKA_ACC_CPU_BT_OMP4_ENABLED -DALPAKA_DEBUG=0 -DBOOST_ALL_NO_LIB -DCATCH_CONFIG_FAST_COMPILE -I/home/yn622878/checkout/alpaka/test/common/include -I/home/yn622878/checkout/alpaka/include -isystem /home/yn622878/checkout/spack/opt/spack/linux-centos7-broadwell/gcc-9.2.0/boost-1.70.0-dkeb472u6a3pgudl5j4fqw7huosyngp4/include -isystem /home/yn622878/checkout/alpaka/test/../thirdParty/catch2/include -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -O2 -ftemplate-backtrace-limit=0 --save-temps -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-disabled-macro-expansion -Wno-global-constructors -Wno-padded -Wno-extra-semi-stmt -fopenmp=libomp -std=gnu++11 -o CMakeFiles/atomic.dir/src/AtomicTest.cpp.o -c /home/yn622878/checkout/alpaka/test/unit/atomic/src/AtomicTest.cpp

Relevant openmp flags are: -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda

The code compiles without error when these are replaced by -fopenmp -fopenmp=libomp -fopenmp-targets=x86_64-pc-linux-gnu (offloading to x86 host).

Error message form clang: fatal error: error in backend: Cannot select: 0x6c60f50: i16,ch = AtomicCmpSwap<(load store monotonic monotonic 1 on %ir.135)> 0x6314438, 0x6c60ba8, 0x70cd828, 0x6c60ad8 0x6c60ba8: i64,ch = CopyFromReg 0x6314438, Register:i64 %36 0x70cdea8: i64 = Register %36 0x70cd828: i16,ch = CopyFromReg 0x6314438, Register:i16 %54 0x6a23440: i16 = Register %54 0x6c60ad8: i16 = add 0x70cd828, 0x768ec50 0x70cd828: i16,ch = CopyFromReg 0x6314438, Register:i16 %54 0x6a23440: i16 = Register %54 0x768ec50: i16,ch = CopyFromReg 0x6314438, Register:i16 %52 0x711ec68: i16 = Register %52 In function: _Z13testAtomicAddIN6alpaka3acc10AccCpuOmp4INSt3__117integral_constantImLm1EEEmEEhEvRKTPbT0 clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation) clang version 9.0.0 (tags/RELEASE_900/final) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /rwthfs/rz/SW/UTIL/clang/9.0.0/bin clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-9: note: diagnostic msg: Error generating preprocessed source(s)

Artem-B commented 3 years ago

@​Artem, is this known to be missing in our backend or is there more to it?

Most likely. There wasn't much work done to bring in support for the new instructions (or variants of the existing ones) added by recent CUDA versions. We've recently added mbarrier and cp_async, but that's about it.

Support for atomics in NVPTX back-end in general is known to have holes. I believe there's a bug open for it already, but I can't find it at the moment.

jdoerfert commented 3 years ago

@​Artem, is this known to be missing in our backend or is there more to it?

766e2662-641f-4db9-99f5-d4a4dc8c4704 commented 3 years ago

There are two sides to this issue:

  1. CUDA 11 supports CAS int16: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomiccas

  2. No other operation is supported for int16 and no atomics are supported on int8. It may be, that this same ICE would be hit for any of these cases too. There should be an error message stating, that the atomic in question is not supported on this type by the backend.

JonChesterfield commented 4 years ago

Cannot select means a missing pattern in dagtodag. Provided nvptx can do atomic cmpswap on i16 this should be a simple fix. I don't know the ISA so am reluctant to volunteer to fix.