Closed heni02 closed 4 months ago
The stack :
@m-schen Can u kindly take a look ?
日志相关panic有以下两个
1.
{"level":"ERROR","time":"2024/05/30 03:21:56.727881 +0000","name":"cn-service.txn","caller":"compile/scope.go:321","msg":"panic in scope run","uuid":"30323939-3435-3164-3962-363262646239","sql":"","error":"internal error: panic runtime error: index out of range [1] with length 1:
runtime.goPanicIndex\n\t/usr/local/go/src/runtime/panic.go:114
github.com/matrixorigin/matrixone/pkg/sql/colexec.GetExprZoneMap
/go/src/github.com/matrixorigin/matrixone/pkg/sql/colexec/evalExpression.go:1077
github.com/matrixorigin/matrixone/pkg/sql/compile.ApplyRuntimeFilters
go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/runtime_filter.go:126
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).handleRuntimeFilter\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scope.go:798
github.com/matrixorigin/matrixone/pkg/sql/compile.buildScanParallelRun
/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scope.go:469
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).ParallelRun\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scope.go:346
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).RemoteRun\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scope.go:291
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).MergeRun.func1
/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scope.go:223
github.com/panjf2000/ants/v2.(*goWorker).run.func1\n\t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.4/worker.go:67\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650","session_id":"018fc784-6ada-732e-82c4-12ced49bc196","statement_id":"018fc784-707c-7ab5-8e05-0d9c157fc1c5","txn_id":"018fc784707c7af48ec1b14c4ab3b114","span":{"trace_id":"67b508b0-11b3-eb9e-1300-c1a62fe291ad","span_id":"5525b5a2f91ad0df","kind":"remote"}}
2.
{"level":"ERROR","time":"2024/05/30 02:25:42.948595 +0000","name":"cn-service.txn","caller":"compile/scopeRemoteRun.go:317","msg":"panic in scope remoteRun","uuid":"38613532-6130-3030-3066-343035353639","sql":"execute __mo_stmt_id_10","error":"internal error: panic runtime error: index out of range [-1]:
runtime.goPanicIndex\n\t/usr/local/go/src/runtime/panic.go:114
github.com/matrixorigin/matrixone/pkg/pb/pipeline.encodeVarintPipeline
go/src/github.com/matrixorigin/matrixone/pkg/pb/pipeline/pipeline.pb.go:9785
github.com/matrixorigin/matrixone/pkg/pb/pipeline.(*Pipeline).MarshalToSizedBuffer
/go/src/github.com/matrixorigin/matrixone/pkg/pb/pipeline/pipeline.pb.go:9683
github.com/matrixorigin/matrixone/pkg/pb/pipeline.(*Pipeline).Marshal
/go/src/github.com/matrixorigin/matrixone/pkg/pb/pipeline/pipeline.pb.go:9528
github.com/matrixorigin/matrixone/pkg/sql/compile.encodeScope
/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scopeRemoteRun.go:377
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).remoteRun\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scopeRemoteRun.go:341\ngithub.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).RemoteRun\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scope.go:300
github.com/matrixorigin/matrixone/pkg/sql/compile.(*Scope).MergeRun.func1
/go/src/github.com/matrixorigin/matrixone/pkg/sql/compile/scope.go:223
github.com/panjf2000/ants/v2.(*goWorker).run.func1
/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.4/worker.go:67
runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650","session_id":"018fc750-c1fe-76f2-8f18-658db02f1545","statement_id":"018fc751-4416-7946-b9da-2541cc9566d4","txn_id":"018fc7514416798aae29bd01cace9054","span":{"trace_id":"9021d205-1586-da8f-4e06-801bd634c006","span_id":"43a90171766e219a"}}
感觉都是加了defer去捕获panic后暴露出来的问题,之前应该也有这样的错误。
@badboynt1 第一个panic需要你帮忙确认一下,runtime有没有可能会推下来一个出错的表达式。
该错误似乎只能是race导致的,某个属性在序列化过程中遭到了修改。
暂时猜测是pipeline中的多个算子的Argument用的是同一个对象,而另一部分由于某些原因(如parallel run等)不再需要这个内存,将其release了,导致被其他算子拿去用,因此在序列化过程中被修改。
如以下pipeline: -> merge order -> output
其中2是remote run, 但是1在本地展开执行,展开过程中可能projection不再使用,因此release了。
暂时还是猜测,需要进行确认。
没改好,没定位到具体race的地方,明天继续,。
搁置一下,找不到有race的地方妈的。
这个今天没有相关进展,今天早上在看mpool oom的问题。下午电脑坏了
等明松回来再一起分析
1的部分,如果scope有data race,修改了expr的内容。可以解释得通 2的部分,没有找到什么情形下的data race会有这样的可能
在处理prepare
根据昨天跟明松的讨论。预计可能是scope内部的算子的属性的race 要是那样的话,那应该跟之前filter算子的expr race类似。甚至可能是相同的问题。 待有空再筛查一次,看还有没有其他地方会在运行期间更改scope属性的。
还没空筛查
判断是偶发的data race。目前scope及pipeline生命周期问题做了大量的重构。估计已处理,可以再观察看看
没有再出现,closed
Is there an existing issue for the same bug?
Branch Name
main
Commit ID
8b019d284
Other Environment Information
Actual Behavior
job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9288081660/job/25582868306
FATAL: mysql_stmt_execute() returned error 20101 (internal error: panic runtime error: index out of range [-1]: runtime.goPanicIndex /usr/local/go/src/runtime/panic.go:114 github.com/matrixorigin/matrixone/pkg/pb/pipeline.encodeVarintPipeline /go/src/github.com/matrixorigin/matrixone/pkg/pb/pipeline/pipeline.pb.go:9785 github.com/matrixorigin/matrixone/pkg/pb/pipeline.(Pipeline).MarshalToSizedBuffer /go/src/github.com/matrixorigin/matrixone/pkg/pb/pipeline/pipeline.pb.go:9683 github.com/matrixorigin/matrixone/pkg/pb/pipeline.(Pipeline).Marshal /go/) for query 'DELETE FROM sbtest10 WHERE id=?' FATAL: `thread_run' function failed: oltp_delete.lua:33: SQL error, errno = 20101, state = 'HY000': internal error: panic runtime error: index out of range [-1]: runtime.goPanicIndex /usr/local/go/src/runtime/panic.go:114 github.com/matrixorigin/matrixone/pkg/pb/pipeline.encodeVarintPipeline /go/src/github.com/matrixorigin/matrixone/pkg/pb/pipeline/pipeline.pb.go:9785 github.com/matrixorigin/matrixone/pkg/pb/pipeline.(Pipeline).MarshalToSizedBuffer /go/src/github.com/matrixorigin/matrixone/pkg/pb/pipeline/pipeline.pb.go:9683 github.com/matrixorigin/matrixone/pkg/pb/pipeline.(Pipeline).Marshal /go/
mo log: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22jV9%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240529%5C%22%7D%20%7C%3D%20%60panic%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221717035752021%22,%22to%22:%221717039571624%22%7D%7D%7D&schemaVersion=1&orgId=1
Expected Behavior
No response
Steps to Reproduce
Additional information
No response