kothar / brotli-go

Not maintained -> see itch.io's more up-to-date fork 👉
https://github.com/itchio/go-brotli
MIT License
82 stars 9 forks source link

Go GC is not happy #14

Closed dsnet closed 8 years ago

dsnet commented 8 years ago

Using go1.5.1

runtime:objectstart Span weird: p=0xc8206a6000 k=0x6410353 s.start=0xc8206a6000 s.limit=0xc8205c0000 s.state=2
fatal error: objectstart: bad pointer in unexpected span

goroutine 8 [running]:
runtime.throw(0x6c9b20, 0x2b)
    /usr/local/go/src/runtime/panic.go:527 +0x90 fp=0xc820030e18 sp=0xc820030e00
runtime.heapBitsForObject(0xc8206a6000, 0x0, 0x0, 0xc800000000, 0x7f1e1595fe30)
    /usr/local/go/src/runtime/mbitmap.go:217 +0x287 fp=0xc820030e50 sp=0xc820030e18
runtime.scanobject(0xc8205ae000, 0xc820025c20)
    /usr/local/go/src/runtime/mgcmark.go:878 +0x239 fp=0xc820030f20 sp=0xc820030e50
runtime.gcDrain(0xc820025c20, 0x7d0)
    /usr/local/go/src/runtime/mgcmark.go:689 +0xf4 fp=0xc820030f50 sp=0xc820030f20
runtime.gcBgMarkWorker(0xc820024a00)
    /usr/local/go/src/runtime/mgc.go:1320 +0x281 fp=0xc820030fb8 sp=0xc820030f50
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820030fc0 sp=0xc820030fb8
created by runtime.gcBgMarkStartWorkers
    /usr/local/go/src/runtime/mgc.go:1239 +0x93

goroutine 1 [chan receive]:
testing.RunTests(0x6de8a8, 0x97daa0, 0x3, 0x3, 0xc820053e01)
    /usr/local/go/src/testing/testing.go:562 +0x8ad
testing.(*M).Run(0xc820045f08, 0xc820073520)
    /usr/local/go/src/testing/testing.go:494 +0x70
main.main()
    _/home/rawr/Projects/compress/internal/benchmark/_test/_testmain.go:58 +0x116

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 5 [runnable, locked to thread]:
gopkg.in/kothar/brotli-go.v0/dec._Cfunc_BrotliDecompressStream(0xc8205afb10, 0xc8205afb18, 0xc8205a4000, 0xc8205a6000, 0xc8205afb20, 0xc8205ae000, 0x2)
    ??:0 +0x39
gopkg.in/kothar/brotli-go.v0/dec.(*BrotliReader).Read(0xc8205ae000, 0xc82022c000, 0x8000, 0x8000, 0x8000, 0x0, 0x0)
    /home/rawr/.go/src/gopkg.in/kothar/brotli-go.v0/dec/decode.go:122 +0x38e
io.copyBuffer(0x7f1e13d379b0, 0xc82057e100, 0x7f1e13d379d8, 0xc8205ae000, 0xc82022c000, 0x8000, 0x8000, 0x10000, 0x0, 0x0)
    /usr/local/go/src/io/io.go:381 +0x247
io.Copy(0x7f1e13d379b0, 0xc82057e100, 0x7f1e13d379d8, 0xc8205ae000, 0xc8205ae000, 0x0, 0x0)
    /usr/local/go/src/io/io.go:351 +0x64
_/home/rawr/Projects/compress/internal/benchmark.testRoundTrip(0xc82009c120, 0x6de6e8, 0x6de6f0)
    /home/rawr/Projects/compress/internal/benchmark/common_test.go:47 +0x102e
_/home/rawr/Projects/compress/internal/benchmark.TestCGoRoundTripBrotli(0xc82009c120)
    /home/rawr/Projects/compress/internal/benchmark/cgo_brotlib_test.go:12 +0x117
testing.tRunner(0xc82009c120, 0x97daa0)
    /usr/local/go/src/testing/testing.go:456 +0x98
created by testing.RunTests
    /usr/local/go/src/testing/testing.go:561 +0x86d
exit status 2
FAIL    _/home/rawr/Projects/compress/internal/benchmark    0.094s

I actually struggled with a very similar problem when I was wrapping zlib. You might want to take a look at what cgzip did to make their cgo stuff happy with go1.5. Specifically the comment here.

dsnet commented 8 years ago

I'm no expert on cgo stuff, so I started a thread on go-nuts about this: https://groups.google.com/forum/#!topic/golang-nuts/Pw90LA0ksMs

kothar commented 8 years ago

Aha, that does kind of make sense. When I implemented the CompressBuffer function, I asked on the mailing list about whether to allocate memory on the C side or the Go side, and the advice was that passing a pointer to Go-allocated memory was preferred. But in that case it was just a byte array, so no pointers to trip up on.

fasterthanlime commented 8 years ago

So, for now, we should use that (opaque buffer) trick on BrotliState?

Let's try that.

fasterthanlime commented 8 years ago

Got it, PR incoming!

kothar commented 8 years ago

It'll be interesting to follow any discussion on that thread - it seems odd that the GC would explore C structs for pointers, but I suppose you could in theory allocate everything on the Go side and not have any unsafe pointers. Clearly we can't do that here.

fasterthanlime commented 8 years ago

@kothar After #16, BrotliState is still allocated on the Go side (and then initialized via the provided C function), the difference is: Go now assumes it's an array of chars, so it doesn't scan it.

Afaict the bug is not that the Go GC scans C structs for pointers to go structs, it's that it reports pointers from C structs to C-allocated blocks as 'invalid pointers' because they don't point to go-allocated memory

kothar commented 8 years ago

This doesn't seem to be completely fixed - the latest Travis build fails for darwin with the same Span Weird error. I can't reproduce it on my local OS X system.

fasterthanlime commented 8 years ago

@kothar can you try restarting the build on travis-ci ? (as project owner, you should see a 'Restart build' button on this page)

I suspect it's running pre-#16 code

kothar commented 8 years ago

Is that a common problem? That build did seem to be referencing #16

fasterthanlime commented 8 years ago

@kothar hey look it passed :)

fasterthanlime commented 8 years ago

Is that a common problem? That build did seem to be referencing #16

My money is on the tests referencing gopkg.in rather than the local version (that would be an issue for branch/PR builds) — but in that case it shouldn't have made a difference since it was all on master, so v0 should be the exact same

I've had some weird OSX issues, but this particular one is the first

fasterthanlime commented 8 years ago

@kothar continued discussion in #17

dsnet commented 8 years ago

Using 2f94aa4da98e38f4506470ffb045ba95a991023a

This problem still persists. I don't have time right now, but I can look into it this weekend.

runtime:objectstart Span weird: p=0xc820f58000 k=0x64107ac s.start=0xc820f58000 s.limit=0xc820e4a000 s.state=2
fatal error: objectstart: bad pointer in unexpected span

runtime stack:
runtime.throw(0x738520, 0x2b)
    /usr/local/go/src/runtime/panic.go:527 +0x90 fp=0x7f9753651c80 sp=0x7f9753651c68
runtime.heapBitsForObject(0xc820f58000, 0x0, 0x0, 0x0, 0x7f9753e606d8)
    /usr/local/go/src/runtime/mbitmap.go:217 +0x287 fp=0x7f9753651cb8 sp=0x7f9753651c80
runtime.scanobject(0xc820900010, 0x7f9753651dd8)
    /usr/local/go/src/runtime/mgcmark.go:878 +0x239 fp=0x7f9753651d88 sp=0x7f9753651cb8
runtime.gcDrain(0x7f9753651dd8, 0xffffffffffffffff)
    /usr/local/go/src/runtime/mgcmark.go:689 +0xf4 fp=0x7f9753651db8 sp=0x7f9753651d88
runtime.gchelper()
    /usr/local/go/src/runtime/mgc.go:1704 +0x89 fp=0x7f9753651e00 sp=0x7f9753651db8
runtime.stopm()
    /usr/local/go/src/runtime/proc1.go:1131 +0x146 fp=0x7f9753651e28 sp=0x7f9753651e00
runtime.exitsyscall0(0xc82006a900)
    /usr/local/go/src/runtime/proc1.go:2138 +0x16c fp=0x7f9753651e50 sp=0x7f9753651e28
runtime.mcall(0x0)
    /usr/local/go/src/runtime/asm_amd64.s:204 +0x5b fp=0x7f9753651e60 sp=0x7f9753651e50
kothar commented 8 years ago

From the golang-nuts discussion, it sounds like even the char buffer approach is suspect, although the problem they describe is to do with go pointers kept in C code.

Maybe the other pointers are also being held onto, such as next_in. I will have a more thorough dig through the code too.

kothar commented 8 years ago

Ah, maybe it is the fact that the decoder increments the next_in pointer, so it no longer points to the beginning of the buffer.

dsnet commented 8 years ago

That sounds like a very likely culprit.

kothar commented 8 years ago

I haven't been able to reproduce this locally yet, so I've uploaded a couple of branches to try to isolate the problem:

https://github.com/kothar/brotli-go/tree/gc_errors - attempts to trigger the error. It's actually giving a different error at the moment due to the stream never returning io.EOF for some reason, and continuing to read bytes from the stream.

https://github.com/kothar/brotli-go/tree/gc_errors_fix - removes the last stored pointer into Go-allocated memory in BrotliReader and re-calculates it on each call from availableBytes.

Any thoughts on reproducibility?

fasterthanlime commented 8 years ago

@kothar I think that for loop isn't doing what you think it does:

for read, err := reader.Read(decoded); err != io.EOF; {
  // ...
}

I think it only calls reader.Read once and keeps adding the initial amount read to count

fasterthanlime commented 8 years ago

for loop

fixed in https://github.com/itchio/brotli-go/commit/6a2a5d56d10c82ad7ceb957ff79f2bfa13065f62

fasterthanlime commented 8 years ago

My fork's gc_errors branch reliably reproduces this issue for me (failing Travis build)

And @kothar's fix fixes it for me :) (passing Travis build)

The only noteworthy things in my fork are:

fasterthanlime commented 8 years ago

cf. PR #21

kothar commented 8 years ago

Aha, thanks for fixing the test, definitely hadn't had enough coffee this morning.

fasterthanlime commented 8 years ago

@kothar I'm really not fond of golang's "no parenthesis for if/for" policy — I can see you meant to write a statement list but clearly the compiler thought you were specifying for clauses.. don't blame caffeine (or lack thereof)! :coffee:

kothar commented 8 years ago

Sigh. https://travis-ci.org/kothar/brotli-go/jobs/92066131

Still failing in decompressor.

fasterthanlime commented 8 years ago

Currently trying to reproduce by explicitly running the GC in the roundtrip test

fasterthanlime commented 8 years ago

Fffinnaaally managed to reproduce, after over six thousand runs of the decompression test... log here.

With a bug this rare, I honestly don't know how we can ever make sure we've fixed it :(

kothar commented 8 years ago

So, this seems to happen because the GC runs during the C function:

https://gist.github.com/fasterthanlime/057793b5e5b09d361139#file-issue-14-log-txt-L178

If I put a GC immediately after the call to the c function, it causes the panic.

SO, I think the fix is to only pass buffer pointers into a new C function, so that the GC can lock them, and then make the nextIn and nextOut pointers exist only on the C side of the binding so that the GC never sees them.

Working on a PR now.

fasterthanlime commented 8 years ago

@kothar oh riiight the Go GC will also scan the stack, which is where function arguments live!

kothar commented 8 years ago

Maybe this time...

dsnet commented 8 years ago

I haven't looked at the implementation, but I ran one of my very intensive benchmarks on it and it didn't crash. Good work!