trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.14k stars 2.92k forks source link

MicroPlans - Allow choosing between different plans on the split level #13534

Open assaf2 opened 2 years ago

assaf2 commented 2 years ago

There are certain connectors that can't always make pushdown guarantees on the coordinator level. For example, ORC files may contain headers describing the min and max values within a certain row group. We want to give Hive and Iceberg connectors the ability to tell the engine for each split if a certain filter is supported. For example, let's assume the engine pushes down the filter col > 1, Hive\Iceberg could respond with 2 plans - one that consumes the filter and another that doesn't. Then, for each split in which the filter is always true (in our example, when the min value is greater than 1), Hive\Iceberg would use the plan that consumes the filter. This approach is a step directing to exploratory optimizer.

In general, we want to give connectors the ability to choose between different plans on the split level. A new abstraction will be introduced - MicroPlanHandle (temporary name) which is a handle to a transformation of a table. For certain metadata APIs, connectors will have the ability to return several MicroPlanHandles. These APIs will only be those that affect the plan on the stage level (for example, they can’t affect any Exchange operation). The optimization process will run in 2 phases. The first phase is the current optimization process. The second phase is where MicroPlanHandles are taken into consideration. At any given time, the connector won’t have the knowledge of which phase is running.

Changes in ConnectorMetadata

The signatures of the following APIs will be deprecated (and eventually removed) and replaced with:

Optional<List<(MicroPlanHandle, ResidualFilter)>> applyFilter(TableHandle, Optional<MicroPlanHandle>, Filter)
Optional<List<(MicroPlanHandle, ResidualProjection)>> applyProjection(TableHandle, Optional<MicroPlanHandle>, Projection)
Optional<List<MicroPlanHandle>> applyAggregation(TableHandle, Optional<MicroPlanHandle>, AggregationFunction)

In the first phase, the engine won’t pass a MicroPlanHandle and will only take the first element of the returned list. Trino might create a new table handle using a new API combine(TableHandle, MicroPlanHandle) -> TableHandle. Another option would be to embed the MicroPlanHandle inside the TableHandle. In the second phase, all the elements in the returned list will be taken. The engine might put a limitation on the amount of MicroPlanHandles a connector can generate by pruning the last MicroPlanHandles off the list. Therefore, the connector should place the broadest required residual element as the first element and then all the other elements ordered by priority.

Worker SPI

A new argument will be passed into ConnectorPageSourceProvider#createPageSource - Optional<List<Pair<MicroPlanHandle, List<ColumnHandle>>> instead of the existing List<ColumnHandle> argument.

ConnectorPageSource will contain a new method: Optional<Integer> getChosenMicroPlan(). The connector will return the ordinal number of the MicroPlanHandle it has chosen.

Plans that are not used by any split won’t be compiled and cached in the worker (lazy approach).

findepi commented 2 years ago

@assaf2 can you please add an example describing what a micro plan is?

assaf2 commented 2 years ago

@findepi it would be an empty interface like ConnectorTableHandle. Connectors will use it to store information about different ways to transform tableHandles. It may contain information about projected columns, filters, aggregations, etc. For example, here is how push predicate will work: Assuming the query SELECT * FROM t WHERE partition_col = 1 AND col1 = 2 AND col2 = 3 and that connector X knows for sure that it can handle predicates on partition_col, that it may or may not be able to handle predicates on col1 and that it can't handle predicates on col2. First optimizer phase -

Second optimizer phase -

Then, for each split, the connector will use TableHandle2 and choose between MicroPlanHandle3 and MicroPlanHandle4.

@martint do you wish to add something?

findepi commented 2 years ago

Why does it need to be a new abstraction if all it does is to pass it along with table handle? Isn't that this information could be just a connector-specific field within the connector's table handle?

assaf2 commented 2 years ago

Because each MicroPlan will have different plan on the engine (that's why the engine will need to be aware of which MicroPlan was chosen for each split). The point is to give more flexibility to connectors that can't make guarantees for the entire table.