grafana / pyroscope

Continuous Profiling Platform. Debug performance issues down to a single line of code
https://grafana.com/oss/pyroscope/
GNU Affero General Public License v3.0
10.13k stars 614 forks source link

chore(v2): compactor concurrency #3628

Closed korniltsev closed 1 month ago

korniltsev commented 1 month ago

I had a misterious panic in dev environment

unexpected fault address 0xc01d800000
2024-10-16T06:33:53Z debug layer=debugger callInjection protocol on:
2024-10-16T06:33:53Z debug layer=debugger   17 PC=0x487ab7
2024-10-16T06:33:53Z debug layer=debugger   18 PC=0x4880a3
2024-10-16T06:33:53Z debug layer=debugger   19 PC=0x53ab5d
2024-10-16T06:33:53Z debug layer=debugger   20 PC=0x4880a3
2024-10-16T06:33:53Z debug layer=debugger   21 PC=0x4880a3
2024-10-16T06:33:53Z debug layer=debugger   22 PC=0x48dbee
2024-10-16T06:33:53Z debug layer=debugger   23 PC=0x487ab7
2024-10-16T06:33:53Z debug layer=debugger   25 PC=0x4880a3
2024-10-16T06:33:53Z debug layer=debugger   27 PC=0x4880a3
2024-10-16T06:33:53Z debug layer=debugger   28 PC=0x47dd84
2024-10-16T06:33:53Z debug layer=debugger   12 PC=0x48dbee
2024-10-16T06:33:53Z debug layer=debugger   24 PC=0x4880a3
2024-10-16T06:33:53Z debug layer=debugger   26 PC=0x48dbee

** execution is paused because your program is panicking **
To continue the execution please connect your client to the debugger.
Stack trace:
 0  0x000000000047dd84 in runtime.throw
    at /home/korniltsev/sdk/go1.23.0/src/runtime/panic.go:1058
 1  0x0000000000480185 in runtime.sigpanic
    at /home/korniltsev/sdk/go1.23.0/src/runtime/signal_unix.go:914
 2  0x0000000002a3a391 in github.com/dgryski/go-groupvarint.Decode4
    at /home/korniltsev/go/pkg/mod/github.com/dgryski/go-groupvarint@v0.0.0-20230630160417-2bfb7969fb3c/decode_amd64.s:11
 3  0x0000000002a76573 in github.com/grafana/pyroscope/pkg/phlaredb/symdb.decodeU32Groups
    at /home/korniltsev/p/pyroscope/pkg/phlaredb/symdb/stacktrace_tree.go:388
 4  0x0000000002a75efc in github.com/grafana/pyroscope/pkg/phlaredb/symdb.(*treeDecoder).unmarshal
    at /home/korniltsev/p/pyroscope/pkg/phlaredb/symdb/stacktrace_tree.go:348
 5  0x0000000002a74eae in github.com/grafana/pyroscope/pkg/phlaredb/symdb.(*parentPointerTree).ReadFrom
    at /home/korniltsev/p/pyroscope/pkg/phlaredb/symdb/stacktrace_tree.go:228
 6  0x0000000002a42698 in github.com/grafana/pyroscope/pkg/phlaredb/symdb.(*stacktraceBlock).readFrom
    at /home/korniltsev/p/pyroscope/pkg/phlaredb/symdb/block_reader.go:600
 7  0x0000000002a4211b in github.com/grafana/pyroscope/pkg/phlaredb/symdb.(*stacktraceBlock).fetch.func1
    at /home/korniltsev/p/pyroscope/pkg/phlaredb/symdb/block_reader.go:572
 8  0x0000000002796bf9 in github.com/grafana/pyroscope/pkg/util/refctr.(*Counter).Inc
    at /home/korniltsev/p/pyroscope/pkg/util/refctr/refctr.go:31
 9  0x0000000002a41c99 in github.com/grafana/pyroscope/pkg/phlaredb/symdb.(*stacktraceBlock).fetch
    at /home/korniltsev/p/pyroscope/pkg/phlaredb/symdb/block_reader.go:558
10  0x0000000002a42f83 in github.com/grafana/pyroscope/pkg/phlaredb/symdb.(*fetchTx).fetch.func2
    at /home/korniltsev/p/pyroscope/pkg/phlaredb/symdb/block_reader.go:700
11  0x00000000016fe489 in golang.org/x/sync/errgroup.(*Group).Go.func1
    at /home/korniltsev/go/pkg/mod/golang.org/x/sync@v0.8.0/errgroup/errgroup.go:78
12  0x00000000004862a1 in runtime.goexit
    at /home/korniltsev/sdk/go1.23.0/src/runtime/asm_amd64.s:1700

Which I am unable to reproduce locally and comprehend. I suspect it is a bug in a go1.23.0 which I used locally. I am going ignore the panic and merge the PR as this should not be caused by the PR. Will monitor dev after merge.

kolesnikovae commented 1 month ago

Thanks for sharing this!

Interesting: the assembly is generated by PyPeach, which reminds me about the recent issue with BP clobbering (this is not the case this time): https://github.com/dgryski/go-groupvarint/blob/master/decode_amd64.s

I think I'll replace groupvarint with something more efficient.

kolesnikovae commented 1 month ago

I wouldn't discount a trivial bug on our end, though – I'll take a closer look

korniltsev commented 1 month ago

Yeah, it reminded me the bug you shared.

But the crash seems to happen while reading relative to rbx register, so it looks different.

I will try to find something as well