compress/lwz hangs building tests for target=wasi gc=conservative

dgryski commented 2 years ago

~/go/src/github.com/tinygo-org/tinygo $ time tinygo test  -x -target=wasi compress/lzw

^C

real    66m56.459s
user    66m31.609s
sys 0m11.294s

Just realized I had this running for over a hour trying to build. This isn't the interp phase (which would time out) nor the test itself (since we would see the wasmtime command line.). Just building. Maybe an infinite-ish loop somewhere?

dgryski commented 2 years ago

With -gc=leaking, it passes. With -scheduler=none it fails with panic: expected inttoptr. The hang seems to be associated with -gc=conservative.

aykevl commented 2 years ago

How I usually debug these issues is by running it for a while and then killing it with SIGABRT. That will result in a panic-like message with stack trace.

It seems to be stuck in tinygo.org/x/go-llvm._Cfunc_LLVMTargetMachineEmitToMemoryBuffer, which is interesting. Sounds like a LLVM bug.

dgryski commented 2 years ago

Yeah, I found that last night. I wonder if something about the data structure we create is creating a loop in the EmitToMemoryBuffer code, although given that the memory size doesn't seem to be increasing at a huge rate, it's possible it's looping trying to look something up rather than looping trying to add something into the buffer.

aykevl commented 2 years ago

...but I can produce bitcode, which is useful for independent debugging:

FAIL
Command exited with non-zero status 1
83.64user 0.37system 1:23.57elapsed 100%CPU (0avgtext+0avgdata 417308maxresident)k
0inputs+8808outputs (0major+157320minor)pagefaults 0swaps

(The FAIL there is probably a bug, it shouldn't print that with -c, @niaow).

I wonder if something about the data structure we create is creating a loop in the EmitToMemoryBuffer code

That step is basically the whole machine code generation step, so a lot of stuff is happening there.

aykevl commented 2 years ago

Huh, that bitcode file is 4.1MB. That's pretty big. Here is what happens when I run it through llc and kill it after a while:

~/src/github.com/tinygo-org/tinygo$ llc-13 test.bc
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.  Program arguments: llc-13 test.bc
1.  Running pass 'Function Pass Manager' on module 'test.bc'.
2.  Running pass 'WebAssembly Instruction Selection' on function '@"(*compress/lzw.Reader).Reset"'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamEi+0x31)[0x7f2d87a13bb1]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm3sys17RunSignalHandlersEv+0x50)[0x7f2d87a11d70]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(+0xdd60b2)[0x7f2d87a140b2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f2d86c2f730]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(+0x135b56a)[0x7f2d87f9956a]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(+0x1359eea)[0x7f2d87f97eea]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm12SelectionDAG7CombineENS_12CombineLevelEPNS_9AAResultsENS_10CodeGenOpt5LevelE+0x903)[0x7f2d87f99233]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv+0xf6)[0x7f2d88163656]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE+0x1865)[0x7f2d88162f35]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE+0x7cc)[0x7f2d88160a8c]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE+0xfe)[0x7f2d87d5b55e]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x390)[0x7f2d87b48940]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x33)[0x7f2d87b4e533]
/lib/x86_64-linux-gnu/libLLVM-13.so.1(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x3a8)[0x7f2d87b48f48]
llc-13(main+0x209a)[0x40fc1a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f2d8675f09b]
llc-13(_start+0x2a)[0x40d1ba]

aykevl commented 2 years ago

Trying bugpoint now. It's rather frustrating because I need to keep killing the process every time after 10 seconds or so.

dgryski commented 2 years ago

I'm trying to see if it happens with llvm 12, looking at the commit just before llvm13 support was merged into tinygo.

aykevl commented 2 years ago

Getting close. The problem is (*compress/lzw.Reader).Reset.

aykevl commented 2 years ago

FYI: I'm running bugpoint-13 --compile-custom llc-13 test.bc (with test.bc from tinygo test -c ...).

aykevl commented 2 years ago

Actually, there's a much better command: bugpoint-13 --compile-custom llc-13 --timeout 1 test.bc. That timeout flag is very useful! I can now go and watch it figure out what kind of code is causing the crash.

dgryski commented 2 years ago

Still happens with llvm12 at the checkout before llvm13 change landed. Building llvm11 and we'll see if it happens there too..

dgryski commented 2 years ago

Still happens with llvm11 when the testenv stub was merged. Seems like this never worked, we just never got around to testing / noticing.

aykevl commented 2 years ago

Here is the buggy file: test.zip

Extract it, and run llc-13 test.bc (or however your variant of llc is called).

dgryski commented 2 years ago

Trying to minimize the lwz test source code, after a while I decided to try with -opt=0 and got the following error instead:

Caused by:
    0: WebAssembly translation error
    1: Invalid input WebAssembly code at offset 2573209: locals exceed maximum

Will keep plugging away at TestReaderReset ...

aykevl commented 2 years ago

I think I might see what's going on here. Let me check...

aykevl commented 2 years ago

Ok, so I think what's going wrong here.

The Reset function itself looks pretty harmless:

func (r *Reader) Reset(src io.Reader, order Order, litWidth int) {
    *r = Reader{}
    r.init(src, order, litWidth)
}

The issue however is that Reader is a really, really large struct. Like 20kB in size. Unfortunately, in Go SSA it looks like this:

# Name: (*compress/lzw.Reader).Reset
# Package: compress/lzw
# Location: /usr/local/go1.17/src/compress/lzw/reader.go:237:18
# Locals:
#   0:  t0 Reader
func (r *Reader) Reset(src io.Reader, order Order, litWidth int):
0:                                                                entry P:0 S:0
    ; var r *compress/lzw.Reader @ 238:3 is r
    ; address of *ast.StarExpr @ 238:2 is r
    t0 = local Reader ()                                            *Reader
    t1 = *t0                                                         Reader
    ; address of *ast.CompositeLit @ 238:7 is r
    *r = t1
    ; var r *compress/lzw.Reader @ 239:2 is r
    ; var src io.Reader @ 239:9 is src
    ; var order compress/lzw.Order @ 239:14 is order
    ; var litWidth int @ 239:21 is litWidth
    t2 = (*Reader).init(r, src, order, litWidth)                         ()
    ; *ast.CallExpr @ 239:2 is t2
    return

That would be roughly equivalent to the following Go code:

func (r *Reader) Reset(src io.Reader, order Order, litWidth int):
    var t0 Reader{}
    t1 = *t0
    *r = t1
    r.init(src, order, litWidth)                         ()
    return

So it allocates a whole Reader{} (of 20kB!), loads it, and stores it to the r. This is lowered more or less directly to LLVM IR as-is, but LLVM has an optimization that converts large loads/stores to lots of small loads/stores because processors can't do these large loads/stores directly anyway and it usually makes later optimizations more effective.

Clang does this a bit differently. For any zero store, it calls the LLVM memset function intrinsic which is later converted back to a regular store instruction or to an actual memset call. So Clang avoids this issue.

...I can't think of a good fix right away.

dgryski commented 2 years ago

Can we call the memset intrinsic ourselves somehow?

aykevl commented 2 years ago

The problem is that this information is lost in Go SSA form, unless we recompute it. Which is a bit difficult.

aykevl commented 2 years ago

But yeah, once we know that a given store stores a zero value, we can trivially convert that to a memset. In fact, I already did that (but of course it didn't help):

diff --git a/compiler/compiler.go b/compiler/compiler.go
index 6c0c98b2..14375e48 100644
--- a/compiler/compiler.go
+++ b/compiler/compiler.go
@@ -1192,6 +1199,14 @@ func (b *builder) createInstruction(instr ssa.Instruction) {
                        // nothing to store
                        return
                }
+               if llvmVal == llvm.ConstNull(llvmVal.Type()) {
+                       if llvmAddr.Type() != b.i8ptrType {
+                               llvmAddr = b.CreateBitCast(llvmAddr, b.i8ptrType, "")
+                       }
+                       size := llvm.ConstInt(b.uintptrType, b.targetData.TypeAllocSize(llvmVal.Type()), false)
+                       b.createMemset0Call(llvmAddr, size)
+                       return
+               }
                b.CreateStore(llvmVal, llvmAddr)
        default:
                b.addError(instr.Pos(), "unknown instruction: "+instr.String())
--- a/compiler/intrinsics.go
+++ b/compiler/intrinsics.go
@@ -33,6 +33,13 @@ func (b *builder) createMemoryCopyCall(fn *ssa.Function, args []ssa.Value) (llvm
 // memory, declaring the function if needed. These calls will be lowered to
 // regular libc memset calls if they aren't optimized out in a different way.
 func (b *builder) createMemoryZeroCall(args []ssa.Value) (llvm.Value, error) {
+       ptr := b.getValue(args[0])
+       len := b.getValue(args[1])
+       b.createMemset0Call(ptr, len)
+       return llvm.Value{}, nil
+}
+
+func (b *builder) createMemset0Call(ptr, len llvm.Value) {
        fnName := "llvm.memset.p0i8.i" + strconv.Itoa(b.uintptrType.IntTypeWidth())
        llvmFn := b.mod.NamedFunction(fnName)
        if llvmFn.IsNil() {
@@ -40,13 +47,12 @@ func (b *builder) createMemoryZeroCall(args []ssa.Value) (llvm.Value, error) {
                llvmFn = llvm.AddFunction(b.mod, fnName, fnType)
        }
        params := []llvm.Value{
-               b.getValue(args[0]),
+               ptr,
                llvm.ConstInt(b.ctx.Int8Type(), 0, false),
-               b.getValue(args[1]),
+               len,
                llvm.ConstInt(b.ctx.Int1Type(), 0, false),
        }
        b.CreateCall(llvmFn, params, "")
-       return llvm.Value{}, nil
 }

 var mathToLLVMMapping = map[string]string{

dgryski commented 2 years ago

So this could hit any other large stack-allocated structure?

aykevl commented 2 years ago

Yes.

dgryski commented 2 years ago

So why doesn't this trigger with gc=leaking?

dgryski commented 2 years ago

The reason your patch didn't work was because the if wasn't triggering. It looks like the llvm.ConstNull that's stored from the ssa.Alloc in createExpr isn't matching when we check later in the ssa.Store case.

aykevl commented 2 years ago

The reason your patch didn't work was because the if wasn't triggering. It looks like the llvm.ConstNull that's stored from the ssa.Alloc in createExpr isn't matching when we check later in the ssa.Store case.

Yes. That's because it's not a constant (as I expected) but a load from that alloca (ssa.Alloc). A load is an instruction, not a constant.

dgryski commented 2 years ago

Last night I was playing with trying to detect "if the previous use of this ssa value was in an alloc" , but rapidly ran into my knowledge of both llvm and Go's SSA package. Just an idea, anyway.

dgryski commented 1 year ago

(Just to update: This is still happening)

dgryski commented 1 year ago

This is now fixed (but not sure why/how).

dgryski commented 1 year ago

https://github.com/tinygo-org/tinygo/pull/3628 <-- compress/lzw now runs tests under wasi.

deadprogram commented 1 year ago

This is part of the v0.28 release so now closing this issue. Thanks!

tinygo-org / tinygo

compress/lwz hangs building tests for target=wasi gc=conservative #2525