[NVPTX] Global of unaligned vector types cause assertion errors and buffer overflows

HazyFish commented 1 year ago

Description

When targeting nvptx or nvptx64, the following code containing v16i20 global crashes backend with

assertion bitPosition < BitWidth && (numBits + bitPosition) <= BitWidth && "Illegal bit extraction" failed during pass NVPTX Assembly Printer in debug build;
memory error corrupted size vs. prev_size during malloc in release build.

The problem does not exist when targeting x86_64, aarch64, riscv64, or wasm64.

Minimal Reproduction

https://godbolt.org/z/jnaqz5qq3

Code

@G = global <16 x i20> <i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7, i20 7>

define void @f(<16 x i64>* %0) {
BB:
  %LGV = load <16 x i20>, <16 x i20>* @G
  %C = sext <16 x i20> %LGV to <16 x i64>
  store <16 x i64> %C, <16 x i64>* %0
  ret void
}

Stack Trace (Debug)

llc: /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Support/APInt.cpp:484: uint64_t llvm::APInt::extractBitsAsZExtValue(unsigned int, unsigned int) const: Assertion `bitPosition < BitWidth && (numBits + bitPosition) <= BitWidth && "Illegal bit extraction"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: ./llvm-project-latest/build-debug/bin/llc -mtriple=nvptx64 ./crash-reports/dagisel-nvptx64/1.ll
1.  Running pass 'Function Pass Manager' on module './crash-reports/dagisel-nvptx64/1.ll'.
2.  Running pass 'NVPTX Assembly Printer' on function '@f'
 #0 0x00007f8fabb02cfa llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:565:11
 #1 0x00007f8fabb02eab PrintStackTraceSignalHandler(void*) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:632:1
 #2 0x00007f8fabb01506 llvm::sys::RunSignalHandlers() /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Support/Signals.cpp:104:5
 #3 0x00007f8fabb035d5 SignalHandler(int) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007f8faa2c0980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #5 0x00007f8fa95bce87 raise /build/glibc-CVJwZb/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f8fa95be7f1 abort /build/glibc-CVJwZb/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f8fa95ae3fa __assert_fail_base /build/glibc-CVJwZb/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f8fa95ae472 (/lib/x86_64-linux-gnu/libc.so.6+0x30472)
 #9 0x00007f8fab8f7312 llvm::APInt::extractBitsAsZExtValue(unsigned int, unsigned int) const /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Support/APInt.cpp:485:3
#10 0x00007f8fba86555a llvm::NVPTXAsmPrinter::bufferLEByte(llvm::Constant const*, int, llvm::NVPTXAsmPrinter::AggBuffer*)::$_1::operator()(llvm::APInt const&) const /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1821:20
#11 0x00007f8fba865263 llvm::NVPTXAsmPrinter::bufferLEByte(llvm::Constant const*, int, llvm::NVPTXAsmPrinter::AggBuffer*) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1830:7
#12 0x00007f8fba8639c3 llvm::NVPTXAsmPrinter::bufferAggregateConstant(llvm::Constant const*, llvm::NVPTXAsmPrinter::AggBuffer*) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1904:63
#13 0x00007f8fba8628c2 llvm::NVPTXAsmPrinter::printModuleLevelGV(llvm::GlobalVariable const*, llvm::raw_ostream&, bool, llvm::NVPTXSubtarget const&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1178:25
#14 0x00007f8fba85ed19 llvm::NVPTXAsmPrinter::emitGlobals(llvm::Module const&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:834:52
#15 0x00007f8fba85e807 llvm::NVPTXAsmPrinter::emitFunctionEntryLabel() /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:453:5
#16 0x00007f8fb009d9a5 llvm::AsmPrinter::emitFunctionHeader() /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:983:26
#17 0x00007f8fb009fd73 llvm::AsmPrinter::emitFunctionBody() /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1551:3
#18 0x00007f8fba867b4e llvm::AsmPrinter::runOnMachineFunction(llvm::MachineFunction&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/include/llvm/CodeGen/AsmPrinter.h:390:5
#19 0x00007f8fba85f761 llvm::NVPTXAsmPrinter::runOnMachineFunction(llvm::MachineFunction&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:487:29
#20 0x00007f8faf288ff5 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/CodeGen/MachineFunctionPass.cpp:91:8
#21 0x00007f8fae441cc6 llvm::FPPassManager::runOnFunction(llvm::Function&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/IR/LegacyPassManager.cpp:1430:23
#22 0x00007f8fae446af2 llvm::FPPassManager::runOnModule(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/IR/LegacyPassManager.cpp:1476:16
#23 0x00007f8fae442599 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/IR/LegacyPassManager.cpp:1545:23
#24 0x00007f8fae44210d llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/IR/LegacyPassManager.cpp:535:16
#25 0x00007f8fae446dd1 llvm::legacy::PassManager::run(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/lib/IR/LegacyPassManager.cpp:1672:3
#26 0x000000000041963c compileModule(char**, llvm::LLVMContext&) /home/henry/aflplusplus-isel/llvm-project-latest/llvm/tools/llc/llc.cpp:736:41
#27 0x00000000004179e2 main /home/henry/aflplusplus-isel/llvm-project-latest/llvm/tools/llc/llc.cpp:417:13
#28 0x00007f8fa959fc87 __libc_start_main /build/glibc-CVJwZb/glibc-2.27/csu/../csu/libc-start.c:344:0
#29 0x00000000004171ea _start (./llvm-project-latest/build-debug/bin/llc+0x4171ea)

Stack Trace (Release)

corrupted size vs. prev_size
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: ./llvm-project-latest/build-release/bin/llc -mtriple=nvptx64 ./crash-reports/dagisel-nvptx64/1.ll
 #0 0x00007f3e4c972a93 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMSupport.so.16git+0x1b9a93)
 #1 0x00007f3e4c9709be llvm::sys::RunSignalHandlers() (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMSupport.so.16git+0x1b79be)
 #2 0x00007f3e4c972f2f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f3e4bc15980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #4 0x00007f3e4af11e87 raise /build/glibc-CVJwZb/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #5 0x00007f3e4af137f1 abort /build/glibc-CVJwZb/glibc-2.27/stdlib/abort.c:81:0
 #6 0x00007f3e4af5c837 __libc_message /build/glibc-CVJwZb/glibc-2.27/libio/../sysdeps/posix/libc_fatal.c:181:0
 #7 0x00007f3e4af638ba /build/glibc-CVJwZb/glibc-2.27/malloc/malloc.c:5342:0
 #8 0x00007f3e4af63abc malloc_consolidate /build/glibc-CVJwZb/glibc-2.27/malloc/malloc.c:4486:0
 #9 0x00007f3e4af67848 _int_malloc /build/glibc-CVJwZb/glibc-2.27/malloc/malloc.c:3713:0
#10 0x00007f3e4af6a0ac malloc /build/glibc-CVJwZb/glibc-2.27/malloc/malloc.c:3068:0
#11 0x00007f3e4b90d298 operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x93298)
#12 0x00007f3e4c89f940 llvm::formatted_raw_ostream::releaseStream() (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMSupport.so.16git+0xe6940)
#13 0x00007f3e4c89f75c llvm::formatted_raw_ostream::~formatted_raw_ostream() (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMSupport.so.16git+0xe675c)
#14 0x00007f3e4da944d5 (anonymous namespace)::MCAsmStreamer::~MCAsmStreamer() MCAsmStreamer.cpp:0:0
#15 0x00007f3e4da944f9 (anonymous namespace)::MCAsmStreamer::~MCAsmStreamer() MCAsmStreamer.cpp:0:0
#16 0x00007f3e4ed1781e llvm::AsmPrinter::~AsmPrinter() (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMAsmPrinter.so.16git+0x3081e)
#17 0x00007f3e573239f2 llvm::NVPTXAsmPrinter::~NVPTXAsmPrinter() (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMNVPTXCodeGen.so.16git+0x339f2)
#18 0x00007f3e4e14e328 llvm::FPPassManager::~FPPassManager() (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMCore.so.16git+0x203328)
#19 0x00007f3e4e14f968 (anonymous namespace)::MPPassManager::~MPPassManager() LegacyPassManager.cpp:0:0
#20 0x00007f3e4e14ff70 non-virtual thunk to (anonymous namespace)::MPPassManager::~MPPassManager() LegacyPassManager.cpp:0:0
#21 0x00007f3e4e14abc8 llvm::PMTopLevelManager::~PMTopLevelManager() (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMCore.so.16git+0x1ffbc8)
#22 0x00007f3e4e14ebb7 llvm::legacy::PassManagerImpl::~PassManagerImpl() (/home/henry/aflplusplus-isel/llvm-project-latest/build-release/bin/../lib/libLLVMCore.so.16git+0x203bb7)
#23 0x000000000040ea17 main (./llvm-project-latest/build-release/bin/llc+0x40ea17)
#24 0x00007f3e4aef4c87 __libc_start_main /build/glibc-CVJwZb/glibc-2.27/csu/../csu/libc-start.c:344:0
#25 0x00000000004092ea _start (./llvm-project-latest/build-release/bin/llc+0x4092ea)

HazyFish commented 1 year ago

cc @DataCorrupted

junaire commented 1 year ago

I think the problem is here (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp#L1817-L1823):

size_t NumBytes = (Val.getBitWidth() + 7) / 8;
    SmallVector<unsigned char, 16> Buf(NumBytes);
    for (unsigned I = 0; I < NumBytes; ++I) {
      Buf[I] = Val.extractBitsAsZExtValue(8, I * 8);
    }

We calculate how many bytes the value contains (rounding up), so i20 got 3 bytes, then we try to zero extend that. However, since the last byte doesn't contain enough bits, so we got a crash.

I can work on this if someone can tell me what the expected behavior is :)

DataCorrupted commented 1 year ago

Thanks for the analysis. If that is correct root cause, you can fix it with one line of code. Just explicitly pad zeros to the head of the Val by zext it Val->zext(NumBytes << 3); before this for loop.

DataCorrupted commented 1 year ago

@Artem-B @junaire After some digging I found the root cause of the crash, but I need some help fixing it.

In short, the size of AggBuffer is aligned as a whole, while the element is not aligned. A <4 x i20> has 80 bits thus getting 10 bytes. However, when coping a constant into the buffer(shown below), we treat each element type as byte aligned, thus <4 x i20> would require 4 x 3 = 12bytes.

https://github.com/llvm/llvm-project/blob/8abd70081f761738e82b37b2891b60ad034f3880/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp#L1822-L1829

This difference is the root cause of the problem. To fix this I can modify AggBuffer to allow insertion of bits, but I am not sure if you will feel like too hacky, as AggBuffer is also designed to be byte aligned.

junaire commented 1 year ago

@Artem-B @junaire After some digging I found the root cause of the crash, but I need some help fixing it.

In short, the size of AggBuffer is aligned as a whole, while the element is not aligned. A <4 x i20> has 80 bits thus getting 10 bytes. However, when coping a constant into the buffer(shown below), we treat each element type as byte aligned, thus <4 x i20> would require 4 x 3 = 12bytes.

https://github.com/llvm/llvm-project/blob/8abd70081f761738e82b37b2891b60ad034f3880/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp#L1822-L1829

This difference is the root cause of the problem. To fix this I can modify AggBuffer to allow insertion of bits, but I am not sure if you will feel like too hacky, as AggBuffer is also designed to be byte aligned.

Hi, I actually tried this before (squash all bits into the AggBuffer without considering their alignment). Besides, I also tried to preallocate enough space for AggBuffer so every element and their alignment could fit. However, unfortunately, both strategies failed (verifier crashes, same backtrace above) :( I bet there's something different with the NVPTX target, so I'd like to know @Artem-B 's ideas...

Artem-B commented 1 year ago

Caveat: I didn't look into the details yet, will do once I'm back at work. The stuff below are my general thoughts on this.

Non-power-of-2 sized integers (let's call them PO2) are not handled particularly well in NVPTX in general. I vaguely recall that we promote scalars up to the next PO2 size, but vectors of such oddly-sized integers are likely to have problems, as you've discovered.

The problem does not exist when targeting x86_64, aarch64, riscv64, or wasm64

That's not a very strong argument.

Would it be nice to have v16i20 work? Yes.
Is v16i20 not working a bug? Not necessarily. Finding IR which works on one back-end, but not the other is not that uncommon. Not all back-ends support everything representable in IR. Sometimes intentionally, sometimes because there was no need. This case is probably the latter.

What to do:

if you need to get things to work ASAP, stick with PO2 types. I.e. don't generate v16i20, use v16i32. It has much better chance of working out of the box. Even if you can express v16i20 in IR, it will not necessarily result in the best code. Most likely it will get chopped into substantial amount of bit-twiddling overhead once legalizer is done with it. Sticking with the types natively supported by the target makes everyone's life much easier.
Plumbing through type promotion for non-PO2 types to next largest POT type should be doable in principle, but so far there's been no need. Whether it's worth it for your particular use case is up to you.

TBH, I'm not even quite sure what would be the right way to store such a vector in memory.

As 16 x 3bytes as one byte is the smallest addressable memory bit?
As 16 x i32 as i32 would be much more efficient for loads/stores?

How does x64 deal with this? One of the implicit assumptions NVPTX users rely on in practice is that in-memory layout matches that of the host and the host sort of happens to be x64, even though these days NVIDIA GPUs are used on PPC and ARM as well. Making sure that we store such a vector the same way we do on x64 would probably make most sense.

llvm / llvm-project