matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.78k stars 275 forks source link

[Bug]: TPCH 10G Q21 was lost connection and the following connection can not be established #7635

Closed aressu1985 closed 1 year ago

aressu1985 commented 1 year ago

Is there an existing issue for the same bug?

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93): 8051e3b3e28311b6a52b34fad9a62bc3727c88bf
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

tools log:

2023-01-15 07:21:33 Now start to compare query result for [report/TPCH_10/q20.txt,golden/TPCH_10/q20.txt] 2023-01-15 07:21:33 The query q20 has be executed successfully,and cost: 30226

2023-01-15 07:21:33 Now start to execute the query queries/q21.sql,please wait..... 2023-01-15 07:22:14 The query q21 has failed to be executed.

ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query

2023-01-15 07:22:14 Now start to execute the query queries/q22.sql,please wait..... 2023-01-15 07:22:14 The query q22 has failed to be executed.

ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1:6001' (111)

2023-01-15 07:22:14 Now start to execute the query queries/q3.sql,please wait..... 2023-01-15 07:22:14 The query q3 has failed to be executed.

ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1:6001' (111)

2023-01-15 07:22:14 Now start to execute the query queries/q4.sql,please wait..... 2023-01-15 07:22:14 The query q4 has failed to be executed.

ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1:6001' (111)

mo log:

023/01/15 07:21:20.877698 +0800 INFO hakeeper-client-backend morpc/backend.go:654 start connect to remote {"remote": "127.0.0.1:32001", "backend-id": "4b5dba1d-8c50-4524-b2b7-3735655734e3"} 2023/01/15 07:21:21.055185 +0800 INFO hakeeper-client-backend morpc/backend.go:658 connect to remote succeed {"remote": "127.0.0.1:32001", "backend-id": "4b5dba1d-8c50-4524-b2b7-3735655734e3"} 2023/01/15 07:21:21.055420 +0800 INFO rpc-client[hakeeper-client([connectToHAKeeper])] morpc/client.go:328 gc inactive backends task started 2023/01/15 07:21:21.055651 +0800 INFO hakeeper-client-backend morpc/backend.go:455 read loop started {"remote": "127.0.0.1:32001", "backend-id": "4b5dba1d-8c50-4524-b2b7-3735655734e3"} 2023/01/15 07:21:21.055678 +0800 INFO hakeeper-client-backend morpc/backend.go:360 write loop started {"remote": "127.0.0.1:32001", "backend-id": "4b5dba1d-8c50-4524-b2b7-3735655734e3"} 2023/01/15 07:21:21.328110 +0800 INFO logservice/hakeeper_client.go:382 isHAKeeper: true, err: 2023/01/15 07:21:21.477621 +0800 INFO logservice/hakeeper_client.go:382 isHAKeeper: true, err: 2023/01/15 07:21:23.108505 +0800 INFO frontend/util.go:515 time of Exec.Build : 2.657437586s connectionId 1402 2023/01/15 07:21:23.447773 +0800 INFO frontend/util.go:515 rowCount 2 time of getDataFromPipeline : 18.064µs processBatchTime 10.671µs row2colTime 0s allocateOutBufferTime 500ns outputQueue.flushTime 0s processBatchTime - row2colTime - allocateOutbufferTime - flushTime 10.171µs restTime(=tTime - row2colTime - allocateOutBufferTime) 17.564µs protoStats flushCount 0 writeCount 1 writeBytes 0 0 MB connectionId 1402 2023/01/15 07:21:23.447856 +0800 INFO frontend/util.go:515 time of Exec.Run : 229.979466ms connectionId 1402 2023/01/15 07:21:23.448029 +0800 INFO frontend/util.go:500 query trace status {"connection_id": 1402, "statement": "show tables",

023/01/15 07:21:23.594430 +0800 INFO frontend/util.go:515 time of Exec.Build : 2.870258725s connectionId 1393 2023/01/15 07:21:23.594684 +0800 ERROR frontend/util.go:510 write tcp4 127.0.0.1:6001->10.222.1.128:39486: write: broken pipe {"session_info": "connectionId 1393"} github.com/matrixorigin/matrixone/pkg/frontend.logError /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/util.go:510 github.com/matrixorigin/matrixone/pkg/frontend.(MysqlCmdExecutor).doComQuery /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/mysql_cmd_executor.go:4122 github.com/matrixorigin/matrixone/pkg/frontend.(MysqlCmdExecutor).ExecRequest /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/mysql_cmd_executor.go:4275 github.com/matrixorigin/matrixone/pkg/frontend.(Routine).handleRequest /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/routine.go:165 github.com/matrixorigin/matrixone/pkg/frontend.(RoutineManager).Handler /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/routine_manager.go:290 github.com/fagongzi/goetty/v2.(server).doConnection /home/github/go/pkg/mod/github.com/fagongzi/goetty/v2@v2.0.3-0.20221212132037-abf2d4c05484/application.go:381 github.com/fagongzi/goetty/v2.(server).doStart.func2.1 /home/github/go/pkg/mod/github.com/fagongzi/goetty/v2@v2.0.3-0.20221212132037-abf2d4c05484/application.go:344 2023/01/15 07:21:23.593691 +0800 ERROR log-service taskservice/task_service_cron.go:97 query cron tasks failed {"uuid": "7c4dccb4-4d3c-41f8-b482-5251dc7a41bf", "error": "context deadline exceeded"} github.com/matrixorigin/matrixone/pkg/taskservice.(taskService).fetchCronTasks /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/taskservice/task_service_cron.go:97 github.com/matrixorigin/matrixone/pkg/common/stopper.(Stopper).doRunCancelableTask.func1 /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/common/stopper/stopper.go:259 2023/01/15 07:21:23.595098 +0800 ERROR frontend/util.go:510 query trace status {"connection_id": 1393, "statement": "select cron_task_id, task_metadata_id, task_metadata_executor, task_metadata_context, task_metadata_option, cron_expr, next_time, trigger_times, create_at, update_at from mo_task.sys_cron_task", "status": "fail", "error": "write tcp4 127.0.0.1:6001->10.222.1.128:39486: write: broken pipe", "span": {"trace_id": "2768d714-9462-11ed-8d55-b07b25f8b524", "kind": "statement"}, "session_info": "connectionId 1393"} github.com/matrixorigin/matrixone/pkg/frontend.logError

github.com/matrixorigin/matrixone/pkg/frontend.logError /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/util.go:510 github.com/matrixorigin/matrixone/pkg/frontend.logStatementStringStatus /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/util.go:494 github.com/matrixorigin/matrixone/pkg/frontend.logStatementStatus /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/util.go:484 github.com/matrixorigin/matrixone/pkg/frontend.(MysqlCmdExecutor).doComQuery /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/mysql_cmd_executor.go:4130 github.com/matrixorigin/matrixone/pkg/frontend.(MysqlCmdExecutor).ExecRequest /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/mysql_cmd_executor.go:4275 github.com/matrixorigin/matrixone/pkg/frontend.(Routine).handleRequest /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/routine.go:165 github.com/matrixorigin/matrixone/pkg/frontend.(RoutineManager).Handler /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/routine_manager.go:290 github.com/fagongzi/goetty/v2.(server).doConnection /home/github/go/pkg/mod/github.com/fagongzi/goetty/v2@v2.0.3-0.20221212132037-abf2d4c05484/application.go:381 github.com/fagongzi/goetty/v2.(server).doStart.func2.1 /home/github/go/pkg/mod/github.com/fagongzi/goetty/v2@v2.0.3-0.20221212132037-abf2d4c05484/application.go:344 2023/01/15 07:21:23.595391 +0800 ERROR frontend/util.go:525 rt send response failed &{1 0 3 0xca569ffdb0 0 0 0}. error:write tcp4 127.0.0.1:6001->10.222.1.128:39486: write: broken pipe connectionId 1393 github.com/matrixorigin/matrixone/pkg/frontend.logErrorf /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/util.go:525 github.com/matrixorigin/matrixone/pkg/frontend.(Routine).handleRequest /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/routine.go:171 github.com/matrixorigin/matrixone/pkg/frontend.(RoutineManager).Handler /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/routine_manager.go:290 github.com/fagongzi/goetty/v2.(server).doConnection /home/github/go/pkg/mod/github.com/fagongzi/goetty/v2@v2.0.3-0.20221212132037-abf2d4c05484/application.go:381 github.com/fagongzi/goetty/v2.(server).doStart.func2.1

github.com/matrixorigin/matrixone/pkg/hakeeper/task.(scheduler).queryTasks /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/hakeeper/task/task_scheduler.go:118 github.com/matrixorigin/matrixone/pkg/hakeeper/task.(scheduler).Schedule /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/hakeeper/task/task_scheduler.go:53 github.com/matrixorigin/matrixone/pkg/logservice.(store).taskSchedule /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/logservice/store_hakeeper_check.go:199 github.com/matrixorigin/matrixone/pkg/logservice.(store).hakeeperCheck /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/logservice/store_hakeeper_check.go:143 github.com/matrixorigin/matrixone/pkg/logservice.(store).ticker /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/logservice/store.go:671 github.com/matrixorigin/matrixone/pkg/logservice.(store).startHAKeeperReplica.func1 /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/logservice/store.go:224 github.com/matrixorigin/matrixone/pkg/common/stopper.(*Stopper).doRunCancelableTask.func1 /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/common/stopper/stopper.go:259 2023/01/15 07:21:24.038479 +0800 INFO frontend/util.go:515 check password succeeded connectionId 1403 2023/01/15 07:21:24.113394 +0800 INFO frontend/util.go:500 query trace {"connection_id": 1403, "query": "use mo_task", "session_info": "connectionId 1403"} 2023/01/15 07:21:24.251023 +0800 INFO frontend/util.go:515 User f8d24f63-cb8e-4773-9569-1224f92fba43 change database from [] to [mo_task] connectionId 1403 2023/01/15 07:21:24.251155 +0800 INFO frontend/util.go:500 query trace status {"connection_id": 1403, "statement": "use mo_task", "status": "success", "span": {"trace_id": "297a11f1-9462-11ed-8d55-b07b25f8b524", "kind": "statement"}, "session_info": "connectionId 1403"} 2023/01/15 07:21:24.251309 +0800 INFO frontend/util.go:500 query trace {"connection_id": 1403, "query": "show tables", "session_info": "connectionId 1403"} 2023/01/15 07:21:24.279735 +0800 INFO frontend/util.go:515 time of Exec.Build : 15.323932ms connectionId 1403 2023/01/15 07:21:24.280816 +0800 INFO frontend/util.go:515 rowCount 2 time of getDataFromPipeline : 11.519µs processBatchTime 8.764µs row2colTime 0s mo-tpch.tar.gz

Expected Behavior

No response

Steps to Reproduce

git clone https://github.com/matrixorigin/mo-tpch.git
./run.sh -c -s 10
./run.sh -l -s 10
./run.sh -q all -s 10 -t 5

Additional information

No response

aressu1985 commented 1 year ago

the error is similar with #7601

nnsgmsone commented 1 year ago

This problem should be caused by a memory leak, and is trying to fix it (more trouble).

nnsgmsone commented 1 year ago

fixed by #7661

aressu1985 commented 1 year ago

tracking for a more day

aressu1985 commented 1 year ago

tracking for a more day

aressu1985 commented 1 year ago

fixed:ce1aae5e38bfab63d644440fb8113c455f31e975