Limit statements cannot finish quickly with LimitPushDown in fault tolerant execution

hackeryang commented 8 months ago

Hello dear community, I executed a query with the fault tolerant execution mode in Trino 423:

trino:ta> set session retry_policy='TASK';
SET SESSION
trino:ta> select * from ta_event_2976 limit 100;

Query 20231019_112636_05204_pfmms, FAILED, 4 nodes
http://ta3:8080/ui/query.html?20231019_112636_05204_pfmms
Splits: 36,352 total, 34,533 done (NaN%)
CPU Time: 119.7s total,   943 rows/s, 2.36MB/s, 2% active
Per Node: 0.3 parallelism,   245 rows/s,  631KB/s
Parallelism: 1.0
Peak Memory: 155MB
1:55 [113K rows, 283MB] [983 rows/s, 2.47MB/s]

Query aborted by user

However, I found that this query didn't finish quickly with 100 rows, and try to scan over the whole table.

I roughly understand the cause of this phenomenon, it may be because the MPP architecture needs the coordinator(i.e. Stage 0) to decide the termination of splits scheduling, and the fault tolerant execution mode won't let the coordinator see 100 rows immediately just like the streaming pipeline execution before.

raunaqmorarka commented 8 months ago

Could you try this with a more recent release ? This should be improved by https://github.com/trinodb/trino/pull/18862 cc: @losipiuk

losipiuk commented 7 months ago

@hackeryang actually it will not improve yet. We have all the needed changes in scheduler - but we also need changes to exchange implementation. Filesystem based exchange still requires work.

trinodb / trino

Limit statements cannot finish quickly with LimitPushDown in fault tolerant execution #19450