matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 276 forks source link

[Bug]: lost connect when run SSB 100G Q4.3 no filter query #7505

Open aressu1985 opened 1 year ago

aressu1985 commented 1 year ago

Is there an existing issue for the same bug?

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93):b8d83a2d1b04d2533242cf80a0a0fe671ed23fbb
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

The SSB 100G Q1.4 no filter query was failed by connection lost by mo, the doubtful error is :

023/01/07 23:09:04.589954 +0800 INFO frontend/util.go:499 query trace status {"connection_id": 1097, "statement": "select taskk _id, task_metadata_id, task_metadata_executor, task_metadata_context, task_metadata_option, task_parent_id, task_status, task_rr unner, task_epoch, last_heartbeat, result_code, error_msg, create_at, end_at from mo_task.sys_async_task where task_status = 0
order by task_id", "status": "success", "span": {"trace_id": "399be8dd-8e9d-11ed-902e-b07b25f8b524", "kind": "statement"}, "sess sion_info": "connectionId 1097"} 2023/01/07 23:09:05.131382 +0800 ERROR fileservice/local_etl_fs.go:73 error: file sys/logs/2023/01/07/rawlog/1673104144_7c4dccbb 4-4d3c-41f8-b482-5251dc7a41bf_ALL.csv already exists github.com/matrixorigin/matrixone/pkg/fileservice.(LocalETLFS).Write /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/fileservice/local_etl_fs.go:73 github.com/matrixorigin/matrixone/pkg/fileservice.(FileServices).Write /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/fileservice/file_services.go:116 github.com/matrixorigin/matrixone/pkg/util/export.(FSWriter).Write /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/export/fs_writer.go:134 github.com/matrixorigin/matrixone/pkg/util/export.(FSWriter).WriteString /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/export/fs_writer.go:158 github.com/matrixorigin/matrixone/pkg/util/trace.batchCSVHandler.NewItemBatchHandler.func1 /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/trace/buffer_pipe.go:111 github.com/matrixorigin/matrixone/pkg/util/trace.batchCSVHandler.NewItemBatchHandler.func2 /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/trace/buffer_pipe.go:124 github.com/matrixorigin/matrixone/pkg/util/export.(bufferExportReq).handle /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/export/batch_processor.go:124 github.com/matrixorigin/matrixone/pkg/util/export.(MOCollector).doExport /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/export/batch_processor.go:359 2023/01/07 23:09:05.131477 +0800 ERROR fileservice/local_etl_fs.go:73 error: file sys/logs/2023/01/07/rawlog/1673104144_7c4dccbb 4-4d3c-41f8-b482-5251dc7a41bf_ALL.csv already exists github.com/matrixorigin/matrixone/pkg/fileservice.(LocalETLFS).Write /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/fileservice/local_etl_fs.go:73 github.com/matrixorigin/matrixone/pkg/fileservice.(FileServices).Write /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/fileservice/file_services.go:116 github.com/matrixorigin/matrixone/pkg/util/export.(FSWriter).Write /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/export/fs_writer.go:134 github.com/matrixorigin/matrixone/pkg/util/export.(FSWriter).WriteString /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/export/fs_writer.go:158 github.com/matrixorigin/matrixone/pkg/util/trace.batchCSVHandler.NewItemBatchHandler.func1

2023/01/07 23:10:08.331652 +0800 ERROR hakeeper-client-backend morpc/backend.go:472 read from backend failed {"remote": "127.0.. 0.1:32001", "backend-id": "529afce7-b795-435c-ba76-8268aa8756a3", "error": "read tcp4 127.0.0.1:38894->127.0.0.1:32001: use of
closed network connection"} 2023/01/07 23:10:08.331855 +0800 ERROR hakeeper-client-backend morpc/backend.go:477 read loop stopped {"remote": "127.0.0.1:3200 01", "backend-id": "529afce7-b795-435c-ba76-8268aa8756a3"} 2023/01/07 23:10:08.331914 +0800 ERROR hakeeper-client-backend v2@v2.0.3-0.20221212132037-abf2d4c05484/session.go:496 close conn neciton failed {"remote": "127.0.0.1:32001", "backend-id": "529afce7-b795-435c-ba76-8268aa8756a3", "session-id": 0, "error": "cc lose tcp4 127.0.0.1:38894->127.0.0.1:32001: use of closed network connection"} 2023/01/07 23:10:08.331945 +0800 INFO hakeeper-client-backend morpc/backend.go:364 write loop stopped {"remote": "127.0.0.1:3200 01", "backend-id": "529afce7-b795-435c-ba76-8268aa8756a3"} 2023/01/07 23:10:08.396192 +0800 ERROR rpc-client[hakeeper-client([connectToHAKeeper])] morpc/client.go:334 gc inactive backendd s task stopped 2023/01/07 23:10:08.396249 +0800 ERROR morpc/message.go:41 error: invalid input: timeout has invalid deadline github.com/matrixorigin/matrixone/pkg/common/morpc.RPCMessage.GetTimeoutFromContext /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/common/morpc/message.go:41 github.com/matrixorigin/matrixone/pkg/common/morpc.(server).startWriteLoop.func1 /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/common/morpc/server.go:286 github.com/matrixorigin/matrixone/pkg/common/stopper.(Stopper).doRunCancelableTask.func1 /data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/common/stopper/stopper.go:259 2023/01/07 23:10:08.437851 +0800 ERROR hakeeper-client-backend morpc/backend.go:472 read from backend failed {"remote": "127.0.. 0.1:32001", "backend-id": "7d1ad83c-48e7-4675-92e5-9d3a37fc8838", "error": "read tcp4 127.0.0.1:51014->127.0.0.1:32001: use of
closed network connection"} 2023/01/07 23:10:08.437999 +0800 ERROR hakeeper-client-backend morpc/backend.go:477 read loop stopped {"remote": "127.0.0.1:3200 01", "backend-id": "7d1ad83c-48e7-4675-92e5-9d3a37fc8838"}

mo-ssb-lostcon.tar.gz

Expected Behavior

No response

Steps to Reproduce

run ssb 100G Q4.3 no filter test:
select year(d_datekey) as year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit, c_region, s_nation, p_category
from lineorder
join date on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
group by year(d_datekey), s_city, p_brand, c_region, s_nation, p_category;

Additional information

No response

aunjgr commented 4 months ago

on leave

aunjgr commented 4 months ago

not working on it today

aunjgr commented 4 months ago

not working on it today

aunjgr commented 3 months ago

not working on it today

aunjgr commented 3 months ago

not working on it today

aunjgr commented 3 months ago

not working on it today

aunjgr commented 3 months ago

not working on it today

aunjgr commented 3 months ago

not working on it today

aunjgr commented 2 months ago

not working on it today

aunjgr commented 2 months ago

not working on it today

aunjgr commented 2 months ago

not working on it today

aunjgr commented 2 months ago

not working on it today

aunjgr commented 2 months ago

not working on it today

aunjgr commented 1 month ago

not working on it today

aunjgr commented 1 month ago

not working on it today

aunjgr commented 1 month ago

not working on it today

aressu1985 commented 6 days ago

update on 2024.11.06 explain result: explain.txt

90s cpu profile: ssb_q43_no_filter_90.zip