Crash when compiling BPF code with atomic operations

sarsanaee commented 2 years ago

fatal error: error in backend: Cannot select: 0x289ca08: i64,ch = AtomicLoadSub<(load store seq_cst 4 on @num_tasks)> 0x27bbcc8, 0x289c8d0, Constant:i64<1>, third_party/bpf/biff.bpf.c:332:3
  0x289c8d0: i64 = BPFISD::Wrapper TargetGlobalAddress:i64<i32* @num_tasks> 0, third_party/bpf/biff.bpf.c:332:3
    0x2893190: i64 = TargetGlobalAddress<i32* @num_tasks> 0, third_party/bpf/biff.bpf.c:332:3
  0x2893bb8: i64 = Constant<1>
In function: biff_pnt
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang-12 -g -O2 -target bpf -D__TARGET_ARCH_x86 -I . -I bazel-out/k8-opt/bin/external/linux/libbpf/include -c third_party/bpf/biff.bpf.c -o bazel-out/k8-opt/bin/third_party/bpf/biff_bpf.o
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'third_party/bpf/biff.bpf.c'.
4.      Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@biff_pnt'
 #0 0x00007f763d589ef3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xbd8ef3)
 #1 0x00007f763d588210 llvm::sys::RunSignalHandlers() (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xbd7210)
 #2 0x00007f763d58964d llvm::sys::CleanupOnSignal(unsigned long) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xbd864d)
 #3 0x00007f763d4d926a (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xb2826a)
 #4 0x00007f763d4d920b (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xb2820b)
 #5 0x00007f763d584cd7 llvm::sys::Process::Exit(int, bool) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xbd3cd7)
 #6 0x0000000000412a70 (/usr/lib/llvm-12/bin/clang+0x412a70)
 #7 0x00007f763d4e5152 llvm::report_fatal_error(llvm::Twine const&, bool) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xb34152)
 #8 0x00007f763d4e5227 (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xb34227)
 #9 0x00007f763dc7a86c (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0x12c986c)
#10 0x00007f763dc79fbd (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0x12c8fbd)
#11 0x00007f763f1dd801 (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0x282c801)
#12 0x00007f763dc7312f llvm::SelectionDAGISel::DoInstructionSelection() (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0x12c212f)
#13 0x00007f763dc72850 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0x12c1850)
#14 0x00007f763dc71bda llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0x12c0bda)
#15 0x00007f763dc6f916 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0x12be916)
#16 0x00007f763d893e2e llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xee2e2e)
#17 0x00007f763d6b136d llvm::FPPassManager::runOnFunction(llvm::Function&) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xd0036d)
#18 0x00007f763d6b6d53 llvm::FPPassManager::runOnModule(llvm::Module&) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xd05d53)
#19 0x00007f763d6b19bf llvm::legacy::PassManagerImpl::run(llvm::Module&) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xd009bf)
#20 0x00007f76436aa3e6 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x15433e6)
#21 0x00007f764394170f (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x17da70f)
#22 0x00007f7642a9fd94 clang::ParseAST(clang::Sema&, bool, bool) (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x938d94)
#23 0x00007f7644035118 clang::FrontendAction::Execute() (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x1ece118)
#24 0x00007f7643fc2dd1 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x1e5bdd1)
#25 0x00007f7644097502 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x1f30502)
#26 0x0000000000412782 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/lib/llvm-12/bin/clang+0x412782)
#27 0x0000000000410afe (/usr/lib/llvm-12/bin/clang+0x410afe)
#28 0x00007f7643cded82 (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x1b77d82)
#29 0x00007f763d4d91ed llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/lib/x86_64-linux-gnu/libLLVM-12.so.1+0xb281ed)
#30 0x00007f7643cde579 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x1b77579)
#31 0x00007f7643cb3b2f clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x1b4cb2f)
#32 0x00007f7643cb3ee7 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x1b4cee7)
#33 0x00007f7643cc899c clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/lib/x86_64-linux-gnu/libclang-cpp.so.12+0x1b6199c)
#34 0x00000000004103d4 main (/usr/lib/llvm-12/bin/clang+0x4103d4)
#35 0x00007f763c45d083 __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:342:3
#36 0x000000000040dcbe _start (/usr/lib/llvm-12/bin/clang+0x40dcbe)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Ubuntu clang version 12.0.0-3ubuntu1~20.04.5
Target: bpf
Thread model: posix
InstalledDir: /usr/bin
clang: note: diagnostic msg:

yonghong-song commented 2 years ago

Could you try recent llvm (llvm14/llvm15)? If that does not fix the issue, could you pose a reproducible test case?

eddyz87 commented 8 months ago

This is the same issue as in #59150, e.g. suppose that t.c has an example from that issue:

$ cat t.c
int simple_test(void *ctx)
{
    int tmp = 0;
    __sync_fetch_and_sub(&tmp, 1);
    return tmp;
}
{llvm} 16:39:23 tmp$ clang --target=bpf -mcpu=v2 -O2 t.c -S -o -
    .text
    .file   "t.c"
fatal error: error in backend: Cannot select: t22: i64,ch = AtomicLoadSub<(load store seq_cst (s32) on %ir.tmp)> t21, FrameIndex:i64<0>, Constant:i64<1>
  t6: i64 = FrameIndex<0>
  t19: i64 = Constant<1>
In function: simple_test
...

sarsanaee commented 8 months ago

I remember I changed the mcpu to "prob" or something and then it worked. Never understood why and how.

eddyz87 commented 8 months ago

I remember I changed the mcpu to "prob" or something and then it worked. Never understood why and how.

Well, that is strange, I just tried with "probe" and get the same error. The problem is that BPF instruction set does not have atomic fetch and subtract, but it does have fetch and add. And I don't think that anywhere at instruction selection phase we currently replace one by the other (adding negation to the argument).

lavenderfly commented 7 months ago

I have the same problem, any suggestions?

lavenderfly commented 7 months ago

I have the same problem, any suggestions?

u64 tmp, n;
//__sync_fetch_and_sub(&tmp, n);
__sync_fetch_and_add(&tmp, ~n+1)

This way of writing may achieve the desired effect.

eddyz87 commented 7 months ago

Please disregard my previous comment. The following example works for me on current main and on 16.0.6:

{llvm} 15:52:55 tmp$ cat fetch_and_sub.c
long test(long *p, long val) {
  return __sync_fetch_and_sub(p, val);
}
{llvm} 15:52:59 tmp$ clang --target=bpf -O2 -S -c fetch_and_sub.c -o - 
    ...
        r0 = r2
    r0 = -r0
    r0 = atomic_fetch_add((u64 *)(r1 + 0), r0)
    exit

The td pattern to handle this was added by Yonghong some time ago (commit 286daafd6512 "[BPF] support atomic instructions"):

// (fragment from BPFInstrInfo.td)

// atomic_load_sub can be represented as a neg followed
// by an atomic_load_add.
def : Pat<(atomic_load_sub_32 ADDRri:$addr, GPR32:$val),
          (XFADDW32 ADDRri:$addr, (NEG_32 GPR32:$val))>;
def : Pat<(atomic_load_sub_64 ADDRri:$addr, GPR:$val),
          (XFADDD ADDRri:$addr, (NEG_64 GPR:$val))>;

@lavenderfly , could you please specify under which circumstances you still need to do the fetch_and_add workaround?

lavenderfly commented 7 months ago

@lavenderfly , could you please specify under which circumstances you still need to do the fetch_and_add workaround?

my clang version is 10, and I'm not allowed to upgrade it. :weary:

eddyz87 commented 7 months ago

@lavenderfly , could you please specify under which circumstances you still need to do the fetch_and_add workaround?

my clang version is 10, and I'm not allowed to upgrade it. 😩

Well, if your kernel version supports BPF atomic operations you can try the same inline assembly trick as in BPF selftests here, e.g. something like below:

#define __imm_insn(name, expr) [name]"i"(*(long *)&(expr))

asm volatile (
  ".8byte %[insn];"
  : : __imm_insn(insn, BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_1, 0) : "r0", "memory" );

(I did not test this specific incantation, but .8bytes trick is used in BPF selftests here and there, the definition of BPF_ATOMIC_OP macro comes from <kernel>/tools/include/linux/filter.h).

llvm / llvm-project

Crash when compiling BPF code with atomic operations #57595