One of users encountered an SQL query (with 270 million rows in the ods.ods_mk_mcos table) that kept running without returning any results. Monitoring the batch, it was noticed that the RowSeqScan stopped fetching data after a while. Additionally, even after killing the query, there was a residual MPP task on one CN node, causing the version to be pinned and not released. This SQL query consistently reproduced the issue in one of our users‘ environment. In the execution plan, SortAgg and MergeSortExchange were observed. Disabling SortAgg and MergeSortExchange allowed the SQL query to execute normally. It seems like there might be a deadlock somewhere. FYI, MergeSortExchange differs from regular Exchange in that it requires fetching data from all input parallelism before returning.
SQL:
SELECT count(1) AS cnt, company_id FROM ods.ods_mk_mcos_files GROUPBY company_id ORDER BY cnt DESC LIMIT 100;
Plan:
Describe the bug
One of users encountered an SQL query (with 270 million rows in the ods.ods_mk_mcos table) that kept running without returning any results. Monitoring the batch, it was noticed that the RowSeqScan stopped fetching data after a while. Additionally, even after killing the query, there was a residual MPP task on one CN node, causing the version to be pinned and not released. This SQL query consistently reproduced the issue in one of our users‘ environment. In the execution plan, SortAgg and MergeSortExchange were observed. Disabling SortAgg and MergeSortExchange allowed the SQL query to execute normally. It seems like there might be a deadlock somewhere. FYI, MergeSortExchange differs from regular Exchange in that it requires fetching data from all input parallelism before returning.
SQL: SELECT count(1) AS cnt, company_id FROM ods.ods_mk_mcos_files GROUPBY company_id ORDER BY cnt DESC LIMIT 100; Plan:
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
No response