matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 277 forks source link

[Bug]: internal error: mpool out of space, alloc 32768 bytes, cap 1099511627776 when running tpcc consistency checking sql during stability test on distributed mode #18823

Open aressu1985 opened 2 months ago

aressu1985 commented 2 months ago

Is there an existing issue for the same bug?

Branch Name

1.2-dev

Commit ID

42a7c45

Other Environment Information

- Hardware parameters:
3*CN: 16C 64G
1*DN: 16C 64G
3*LOG: 4C 16G
2*PROXY: 3C 6G
- OS type:
- Others:

Actual Behavior

After 80 hours since stability test on distributed mode, the following queries are always failed by: 2024-09-18 09:14:06 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 16384 bytes, cap 1099511627776 2024-09-18 09:14:06 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:14:36 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 32768 bytes, cap 1099511627776 2024-09-18 09:14:36 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:15:07 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 32768 bytes, cap 1099511627776 2024-09-18 09:15:07 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:15:37 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 16384 bytes, cap 1099511627776 2024-09-18 09:15:37 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:16:07 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 32768 bytes, cap 1099511627776 2024-09-18 09:16:07 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:16:37 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 32768 bytes, cap 1099511627776 2024-09-18 09:16:37 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:17:07 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 32768 bytes, cap 1099511627776 2024-09-18 09:17:07 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:17:37 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 32768 bytes, cap 1099511627776 2024-09-18 09:17:37 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:18:07 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 416 bytes, cap 1099511627776 2024-09-18 09:18:07 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); 2024-09-18 09:18:37 ERROR ConsistencyCheck:152 - internal error: mpool out of space, alloc 32768 bytes, cap 1099511627776 2024-09-18 09:18:37 ERROR ConsistencyCheck:153 - SQL: (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id);

but when runing this query on a another new session, it was OK: Database changed mysql> mysql> mysql> (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); Empty set (0.10 sec)

mysql> (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); Empty set (0.08 sec)

mysql> (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); Empty set (0.07 sec)

mysql> (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); Empty set (0.07 sec)

mysql> (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); Empty set (0.08 sec)

mysql> (Select d_w_id, d_id, D_NEXT_O_ID - 1 from bmsql_district) except (select o_w_id, o_d_id, max(o_id) from bmsql_oorder group by o_w_id, o_d_id); Empty set (0.07 sec)

mo-log: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22CDC%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-42a7c45-202409142320%5C%22%7D%20%7C%3D%20%60internal%20error:%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221726611881057%22,%22to%22:%221726628258559%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

1. run a mo cluster with config in this issue
2. run tpch 10G loop test processes in one independant tenant
3. run tpcc 10 warehouse and 10 ternimals longrunnig test processes in one independant tenant, prepare mode
4. run sysbench mixed cases(insert/delete/update/select) longrunnig test processes with 75 terminals in one independant tenant,non-prepare mode
5. run another sysbench mixed cases(insert/delete/update/select) longrunnig test processe with  75 terminals in one independant tenant,non-prepare mode

Additional information

No response

reusee commented 1 month ago

无进展

reusee commented 1 month ago

无进展

reusee commented 1 month ago

无进展

reusee commented 1 month ago

无进展

reusee commented 1 month ago

go堆内存会逐步切换成堆外内存,这个问题会更容易解决

reusee commented 4 weeks ago

无进展

reusee commented 3 weeks ago

无进展

reusee commented 3 weeks ago

无进展

reusee commented 2 weeks ago

无进展

reusee commented 1 week ago

无进展

reusee commented 6 days ago

无进展

reusee commented 1 day ago

无进展