trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.25k stars 2.95k forks source link

Cost-based property enforcement #21785

Open devozerov opened 5 months ago

devozerov commented 5 months ago

Motivation

Modern analytical engines use relational operator properties to find optimal plan. Property is a value associated with the operator that doesn't change operator's equivalence and that can be enforced via a special enforcer operator. In distributed systems, there are two common properties:

  1. Distribution - defines how the operator data is distributed across execution units. Enforced via Exchange operator.
  2. Collation - defines how operator's output is sorted. Enforced via Sort operator.

State-of-the-art optimizers are able to find optimal placement of Exchange and Sort operators, possibly enforcing them in various places of the plan. For Exchange, the goal is to find a plan with minimal data movement while still accounting for data skew. For Sort, the goal is to enforce (or propagate) collations to allow for streaming aggregates and merge joins. Both Exchange and Sort placement should be determined via a cost-based optimization.

Examples of products that uses state-of-the-art techniques for :

  1. Greenplum ORCA uses sophisticated property propagation mechanics based on Cascades: https://15721.courses.cs.cmu.edu/spring2016/papers/p337-soliman.pdf
  2. Azue Synapse finds overlapping Exchanges with a cost-based optimizer (see paragraph 3): https://vldb.org/pvldb/vol15/p936-rajan.pdf
  3. Apache Calcite uses memorization and Cascades to find optimal enforcer placement. Any product that uses Apache Calcite can utilize this feature easily. Examples are Dremio and Apache Hive.
  4. Vertica actively uses sorted projections to utilize merge joins. Note that many of these products are lakehouse contenders, and some of them are mentioned in recent Forrester Wave lakehouse report: https://reprints2.forrester.com/#/assets/2/848/RES180732/report

Historically, both Presto and Trino relied mostly on heurstic optimizations. Cost-based optimizations are scattered across the code base, yielding multiple local minima while failing to provide globally optimal plan. Examples are overlapping Exchange placement and data skewness checks in AddExchanges, parallelism estimation in DetermineTableScanNodePartitioning, and limited partitioning propagation mechanics via opaque ConnectorPartitioningHandle. There is no Sort placement optimizer. As a result neither Presto, nor Trino can sufficiently exploit properties during optimization. However, Presto community started discussion around the nextgen query optimizer, which means that the product most likely will gather mentioned features sooner or later.

This puts Trino into vulnerable position, because sophisticated property enforcement in many cases allows other products to find better plans and demonstrate superior performance.

Proposal

This issues proposes to start the discussion about advanced property propagation in Trino. This includes (but not limited to):

  1. Allow tables expose different partitioning schemes depending in the parent requirements. This includes revision of ConnectorMetadata.getTableProperties|getCommonPartitioningHandle|makeCompatiblePartitioning as they all assume static predetermined partitioning
  2. Add cost-based AddExchange alternative based on MEMO and top-down Cascades-like algorithm.
  3. Integrate Sort propagation rule (conceptually similar to AddExchange) that will enable Trino using merge join and streaming aggregations.
findepi commented 4 months ago

cc @martint

mosabua commented 2 months ago

This has been on the agenda to discuss on the Trino Contributor Call a few times now. Ideally @devozerov can join next time and we can discuss more.

https://github.com/trinodb/trino/wiki/Contributor-meetings

devozerov commented 2 months ago

@mosabua Sure, I can prepare a couple slides if that helps

mosabua commented 1 month ago

That would be good. Next contributor call is on the 26th of Sept.

https://github.com/trinodb/trino/wiki/Contributor-meetings

Ping me for a calendar invite if you can attend and will present.