matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.76k stars 273 forks source link

[Bug]: invalid memory address or nil pointer dereference about some operator #18177

Closed jensenojs closed 1 week ago

jensenojs commented 1 month ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

ce7fdc2f51f122

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

image
2024/08/16 07:20:54.830782 +0000 ERROR pipeline/pipeline.go:104 panic in pipeline: internal error: panic runtime error: invalid memory address or nil pointer dereference: 
runtime.panicmem
    /usr/local/go/src/runtime/panic.go:261
runtime.sigpanic
    /usr/local/go/src/runtime/signal_unix.go:881
github.com/matrixorigin/matrixone/pkg/sql/colexec/shuffle.(*Shuffle).Call
    /home/jensen/matrixorigin/matrixone/pkg/sql/colexec/shuffle/shuffle.go:101
github.com/matrixorigin/matrixone/pkg/sql/colexec/connector.(*Connector).Call
    /home/jensen/matrixorigin/matrixone/pkg/sql/colexec/connector/connector.go:47
github.com/matrixorigin/matrixone/pkg/vm/pipeline.(*Pipeline).run
    /home/jensen/matrixorigin/matrixone/pkg/vm/pipeline/pipeline.go:91
github.com/matrixorigin/matrixone/pkg/vm/pipeline.(*Pipeline).MergeRun
    /home/jensen/matrixorigin/matrixone/pkg/vm/pipeline/pipeline.go:78
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).MergeRun
    /home/jensen/matrixorigin/matrixone/pkg/sql/compile/scope.go:302
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).MergeRun.func1
    /home/jensen/matrixorigin/matrixone/pkg/sql/compile/scope.go:262
github.com/panjf2000/ants/v2.(*goWorker).run.func1
    /home/jensen/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.4/worker.go:67
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1695
2024/08/16 07:20:54.830580 +0000 ERROR pipeline/pipeline.go:103 error: internal error: panic runtime error: invalid memory address or nil pointer dereference: 
runtime.panicmem
    /usr/local/go/src/runtime/panic.go:261
runtime.sigpanic
    /usr/local/go/src/runtime/signal_unix.go:881
github.com/matrixorigin/matrixone/pkg/sql/colexec/dispatch.printShuffleResult
    /home/jensen/matrixorigin/matrixone/pkg/sql/colexec/dispatch/dispatch.go:108
github.com/matrixorigin/matrixone/pkg/sql/colexec/dispatch.(*Dispatch).Call
    /home/jensen/matrixorigin/matrixone/pkg/sql/colexec/dispatch/dispatch.go:139
github.com/matrixorigin/matrixone/pkg/vm/pipeline.(*Pipeline).run
    /home/jensen/matrixorigin/matrixone/pkg/vm/pipeline/pipeline.go:91
github.com/matrixorigin/matrixone/pkg/vm/pipeline.(*Pipeline).MergeRun
    /home/jensen/matrixorigin/matrixone/pkg/vm/pipeline/pipeline.go:78
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).MergeRun
    /home/jensen/matrixorigin/matrixone/pkg/sql/compile/scope.go:302
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).MergeRun.func1
    /home/jensen/matrixorigin/matrixone/pkg/sql/compile/scope.go:262
github.com/panjf2000/ants/v2.(*goWorker).run.func1
    /home/jensen/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.4/worker.go:67
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1695 {"span": {"trace_id": "e31e23aa-fb38-ffde-6059-1cdac197e845", "span_id": "0ead55dcebb2d37c"}}

Expected Behavior

No response

Steps to Reproduce

tpch10g in local

# start by
./mo-service -debug-http :9876 -launch ./etc/launch-with-proxy/launch.toml > mo-service.log 2>&1 &

Additional information

No response

jensenojs commented 1 month ago

补充一个inner join的堆栈

image
jensenojs commented 1 month ago

第一个稳定复现这个issue的commit是e31b2ed

commit e31b2ed084cb1b138cda0aaeba3b88663ed6dba5 (HEAD) <- firstly 
Author: nitao <badboynt@126.com>
Date:   Wed Jul 31 17:06:29 2024 +0800

    remove unnecessary merge scope in broadcast join (#17808)

    在某些场景下,broadcast join的probe端会merge不止一次。 这个pr的目标是保证一定只merge一次。
    之前测试场景覆盖不够充分,现在已经重新跑了多个测试场景。

    Approved by: @m-schen, @ouyuanning, @aunjgr

commit 9571846966ad8c250c9cfb06433a1829be77676d
Author: ou yuanning <45346669+ouyuanning@users.noreply.github.com>
Date:   Wed Jul 31 15:51:07 2024 +0800

    fix bug: panic when get vector (type.Oid is T_any) from pool (#17801)

    fix bug: panic when get vector (type.Oid is T_any) from pool

    Approved by: @m-schen, @XuPeng-SH, @aunjgr

但在这个commit的前一个commit, 在同样的负载下有其他的panic, 最起码在eae07a2e6之前就有这个问题.

图片

jensenojs commented 4 weeks ago

fix

heni02 commented 1 week ago

confirm,closed