trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.45k stars 3.01k forks source link

Multi-step execution #684

Open findepi opened 5 years ago

findepi commented 5 years ago

Certain queries would benefit from materializing part of the query (even if only in memory) and re-planning the rest. Typical case is a query over partitioned table where partitioning key column filter is dynamically evaluated:

SELECT ...
FROM table WHERE partition_key = (SELECT .... )
sopel39 commented 5 years ago

For multi-step execution we could have a series of plannings: e.g:

P -> (plan and extract partial queries) -> P2, [Q1, Q2, ...Qn]
  -> (plan and extract partial queries) -> P3, [Q(n+1), Q(n+2), ... Q(n+m)] 
  -> []

So for initial plan P we extract some partial queries Q1, ..., Qn which are fully executed. Then we replan again (with new statistics) and extract further partial queries Q(n+1), ... Q(n+m). We repeat that until we fully consume plan P (e.g: P -> P1 -> P2 -> ... -> Pi -> []).

It seems that there could be some object above SqlQueryManager that would implement such multi-stage iterative approach. For example there could be some: SqlMultiStageQueryManager that owns multiple SqlQueryManager for currently running partial query executions.

Temp tables

As part of planning we could identify which subplans quality for materialization via temp tables, e.g: via TempTablePlanNode. Such TempTablePlanNode would be natural candidates for partial queries in the iterative process above. The plan would be a DAG for moment, but that is probably not an issue.

findepi commented 2 years ago

cc @losipiuk @arhimondr