Closed dsnet closed 8 years ago
I'm no expert on cgo stuff, so I started a thread on go-nuts about this: https://groups.google.com/forum/#!topic/golang-nuts/Pw90LA0ksMs
Aha, that does kind of make sense. When I implemented the CompressBuffer function, I asked on the mailing list about whether to allocate memory on the C side or the Go side, and the advice was that passing a pointer to Go-allocated memory was preferred. But in that case it was just a byte array, so no pointers to trip up on.
So, for now, we should use that (opaque buffer) trick on BrotliState?
Let's try that.
Got it, PR incoming!
It'll be interesting to follow any discussion on that thread - it seems odd that the GC would explore C structs for pointers, but I suppose you could in theory allocate everything on the Go side and not have any unsafe pointers. Clearly we can't do that here.
@kothar After #16, BrotliState is still allocated on the Go side (and then initialized via the provided C function), the difference is: Go now assumes it's an array of chars, so it doesn't scan it.
Afaict the bug is not that the Go GC scans C structs for pointers to go structs, it's that it reports pointers from C structs to C-allocated blocks as 'invalid pointers' because they don't point to go-allocated memory
This doesn't seem to be completely fixed - the latest Travis build fails for darwin with the same Span Weird error. I can't reproduce it on my local OS X system.
@kothar can you try restarting the build on travis-ci ? (as project owner, you should see a 'Restart build' button on this page)
I suspect it's running pre-#16 code
Is that a common problem? That build did seem to be referencing #16
@kothar hey look it passed :)
Is that a common problem? That build did seem to be referencing #16
My money is on the tests referencing gopkg.in
rather than the local version (that would be an issue for branch/PR builds) — but in that case it shouldn't have made a difference since it was all on master, so v0 should be the exact same
I've had some weird OSX issues, but this particular one is the first
@kothar continued discussion in #17
Using 2f94aa4da98e38f4506470ffb045ba95a991023a
This problem still persists. I don't have time right now, but I can look into it this weekend.
runtime:objectstart Span weird: p=0xc820f58000 k=0x64107ac s.start=0xc820f58000 s.limit=0xc820e4a000 s.state=2
fatal error: objectstart: bad pointer in unexpected span
runtime stack:
runtime.throw(0x738520, 0x2b)
/usr/local/go/src/runtime/panic.go:527 +0x90 fp=0x7f9753651c80 sp=0x7f9753651c68
runtime.heapBitsForObject(0xc820f58000, 0x0, 0x0, 0x0, 0x7f9753e606d8)
/usr/local/go/src/runtime/mbitmap.go:217 +0x287 fp=0x7f9753651cb8 sp=0x7f9753651c80
runtime.scanobject(0xc820900010, 0x7f9753651dd8)
/usr/local/go/src/runtime/mgcmark.go:878 +0x239 fp=0x7f9753651d88 sp=0x7f9753651cb8
runtime.gcDrain(0x7f9753651dd8, 0xffffffffffffffff)
/usr/local/go/src/runtime/mgcmark.go:689 +0xf4 fp=0x7f9753651db8 sp=0x7f9753651d88
runtime.gchelper()
/usr/local/go/src/runtime/mgc.go:1704 +0x89 fp=0x7f9753651e00 sp=0x7f9753651db8
runtime.stopm()
/usr/local/go/src/runtime/proc1.go:1131 +0x146 fp=0x7f9753651e28 sp=0x7f9753651e00
runtime.exitsyscall0(0xc82006a900)
/usr/local/go/src/runtime/proc1.go:2138 +0x16c fp=0x7f9753651e50 sp=0x7f9753651e28
runtime.mcall(0x0)
/usr/local/go/src/runtime/asm_amd64.s:204 +0x5b fp=0x7f9753651e60 sp=0x7f9753651e50
From the golang-nuts discussion, it sounds like even the char buffer approach is suspect, although the problem they describe is to do with go pointers kept in C code.
Maybe the other pointers are also being held onto, such as next_in. I will have a more thorough dig through the code too.
Ah, maybe it is the fact that the decoder increments the next_in pointer, so it no longer points to the beginning of the buffer.
That sounds like a very likely culprit.
I haven't been able to reproduce this locally yet, so I've uploaded a couple of branches to try to isolate the problem:
https://github.com/kothar/brotli-go/tree/gc_errors - attempts to trigger the error. It's actually giving a different error at the moment due to the stream never returning io.EOF for some reason, and continuing to read bytes from the stream.
https://github.com/kothar/brotli-go/tree/gc_errors_fix - removes the last stored pointer into Go-allocated memory in BrotliReader and re-calculates it on each call from availableBytes.
Any thoughts on reproducibility?
@kothar I think that for loop isn't doing what you think it does:
for read, err := reader.Read(decoded); err != io.EOF; {
// ...
}
I think it only calls reader.Read
once and keeps adding the initial amount read to count
My fork's gc_errors
branch reliably reproduces this issue for me (failing Travis build)
And @kothar's fix fixes it for me :) (passing Travis build)
The only noteworthy things in my fork are:
testdata
rather than a trivial quick brown fox repetitioncf. PR #21
Aha, thanks for fixing the test, definitely hadn't had enough coffee this morning.
@kothar I'm really not fond of golang's "no parenthesis for if/for" policy — I can see you meant to write a statement list but clearly the compiler thought you were specifying for clauses.. don't blame caffeine (or lack thereof)! :coffee:
Sigh. https://travis-ci.org/kothar/brotli-go/jobs/92066131
Still failing in decompressor.
Currently trying to reproduce by explicitly running the GC in the roundtrip test
Fffinnaaally managed to reproduce, after over six thousand runs of the decompression test... log here.
With a bug this rare, I honestly don't know how we can ever make sure we've fixed it :(
So, this seems to happen because the GC runs during the C function:
https://gist.github.com/fasterthanlime/057793b5e5b09d361139#file-issue-14-log-txt-L178
If I put a GC immediately after the call to the c function, it causes the panic.
SO, I think the fix is to only pass buffer pointers into a new C function, so that the GC can lock them, and then make the nextIn and nextOut pointers exist only on the C side of the binding so that the GC never sees them.
Working on a PR now.
@kothar oh riiight the Go GC will also scan the stack, which is where function arguments live!
Maybe this time...
I haven't looked at the implementation, but I ran one of my very intensive benchmarks on it and it didn't crash. Good work!
Using
go1.5.1
I actually struggled with a very similar problem when I was wrapping zlib. You might want to take a look at what cgzip did to make their cgo stuff happy with go1.5. Specifically the comment here.