risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.97k stars 574 forks source link

[RFC] Heuristic Join Reordering for Multi-Way Inner Joins #2530

Open jon-chuang opened 2 years ago

jon-chuang commented 2 years ago

Is your feature request related to a problem? Please describe. This RFC is meant to propose a heuristic join reordering rule for multi-way inner joins that is meant to:

  1. eliminate cross joins entirely if possible, else reduce to the minimum possible number
  2. use selectivity heuristics such as presence of primary keys as subsets of join keys to choose the join ordering within each connected component.

Here is a description of the algorithm proposed:


Our heuristic join reordering algorithm will try to perform a left-deep join.
It will try to do the following:

1. First, split the join graph, with eq join conditions as graph edges, into their connected
   components. Repeat the procedure in 2. with the largest connected components down to
   the smallest.
2. For each connected component, add joins to the chain, prioritizing adding those
   joins to the bottom of the chain if their join conditions have:
       a. eq joins between primary keys on both sides 
       b. eq joins with primary keys on one side
       c. more equijoin conditions
   in that order. This forms our selectivity heuristic.
3. Thirdly, we will emit a left-deep cross-join of each of the left-deep joins of the
   connected components. Depending on the type of plan, this may result in a planner failure 
   (e.g. for streaming). No cross-join will be emitted for a single connected component.
4. Finally, we will emit, above the left-deep join tree:
       a. a filter with the non eq conditions 
       b. a projection which reorders the output column ordering to agree with the
          original ordering of the joins.
   The filter will then be pushed down by another filter pushdown pass.

Related:

Tasks:

Low priority tasks:

keanji-x commented 2 years ago

My idea is similar to this. But there are some details:

jon-chuang commented 2 years ago

The project should be absorbed to MultipleJoin for the situation join -> project -> join.

Agreed on this. But I decided to make this a low priority after the rest has been done.

equal join associated with 2 tables > other join associated with 2 tables > cross product = join associated with multiple tables

Currently, the plan for the RFC is more fine grained:

But it omits the last > sign. We could definitely detect (non-equijoin associated with 2 tables) and add this into the join priority. I'm not sure we can take advantage of this. For now, it will still result in an error for streaming. Any ideas @fuyufjh ?

For me, I did not consider 3 columns as a join condition (A join B join{a.x+b.x = c.x} C). This is considered a filter on a cross join. Technically, there is some optimization that could turn this into an eq join but I don't think we implement it yet.

keanji-x commented 2 years ago

For me, I did not consider 3 columns as a join condition (A join B join{a.x+b.x = c.x} C). This is considered a filter on a cross join.

You mean extract the conditions associated with multiple tables and create a filter post after the MultipleJoin?

jon-chuang commented 2 years ago

You mean extract the conditions associated with multiple tables and create a filter post after the MultipleJoin?

Yes. It will be pushed down into one of the joins after the multijoin has been reordered. However, we don't try to make the joins adjacent during the join ordering. I'm guessing this is equivalent to calcite.

keanji-x commented 2 years ago

You mean extract the conditions associated with multiple tables and create a filter post after the MultipleJoin?

Yes. It will be pushed down into one of the joins after the multijoin has been reordered. However, we don't try to make the joins adjacent during the join ordering. I'm guessing this is equivalent to calcite.

However, it may cause some cross-products, e.g., B join C join A for the above example. Maybe we can use some constraints to avoid it. For example, we can maintain some partial order relations to enforce C must appear after B and A.

jon-chuang commented 2 years ago

However, it may cause some cross-products, e.g., B join C join A for the above example. Maybe we can use some constraints to avoid it. For example, we can maintain some partial order relations to enforce A must appear after B and C.

I assume that A join B join{a.x+b.x = c.x} C is probably more like (A join{a.y = b.y} B) join{a.x+b.x = c.x} C, So we will be doing a cross product between (A join B) and C. Technically, one could do a Project(A join B, (*, a.x+b.x AS z)) join {z = c.x} C.

As for pairwise non-eq join and multi-table conditions, we could definitely have another round of optimization that treats these as being higher selectivity than cross join, and use these to choose the order in which connected components (in terms of eq join) are joined together.

It can be explored (as a lower priority task) after prioritizing equijoins is complete.

There are many ways to improve the join ordering. But do note that join ordering is generally hard. For heuristics, I think we may want to first keep it simple. We could leave more nuanced decisions to a statistics-driven optimizer.

lmatz commented 2 years ago

@jon-chuang May check out https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java and https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/rules/LoptOptimizeJoinRule.java

https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/tools/Programs.java#L186 as the entry point

jon-chuang commented 2 years ago

@lmatz I think those already use table statistics https://github.com/apache/calcite/blob/86b34f7ba81cc85aec4589f8fa4dcbf503ad8677/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java#L127

But seems to use some selectivity heuristics based on the condition as well (https://github.com/apache/calcite/blob/86b34f7ba81cc85aec4589f8fa4dcbf503ad8677/core/src/main/java/org/apache/calcite/rel/metadata/RelMdUtil.java#L392)

The second one seems related to: https://www.cockroachlabs.com/blog/join-ordering-ii-the-ikkbz-algorithm/

keanji-x commented 2 years ago

For the left-deep tree, it will first check the priority of the condition and then select the factor according to cardinality if priority is equal: https://github.com/apache/calcite/blob/86b34f7ba81cc85aec4589f8fa4dcbf503ad8677/core/src/main/java/org/apache/calcite/rel/rules/LoptOptimizeJoinRule.java#L772

The condition priority is set in https://github.com/apache/calcite/blob/86b34f7ba81cc85aec4589f8fa4dcbf503ad8677/core/src/main/java/org/apache/calcite/rel/rules/LoptMultiJoin.java#L536

By the way, we need a left-deep tree or a bushy tree for join reordering?

st1page commented 2 years ago

LGTM

skyzh commented 2 years ago

Some off-topic things: weeks ago I've written a so-called delta-join solver: https://github.com/singularity-data/risingwave/blob/main/src/frontend/src/optimizer/delta_join_solver.rs. It seems to have done exactly the same thing as step 1 and 2, but specifically designed to output one lookup path in delta join.

lmatz commented 2 years ago
  • how will you get the primary key? I propose just analyzing only when the input is a LogicalScan. the pk_indices seems a special thing only for stream operators.

We also support pk for normal table, right?

We may do a similar but different pk derivation for the batch plan.

As long as all the columns of the pk of a table are preserved through some operators(recursively), then we can consider the output of these operators as having the same pk of table.

A simple version requires any column in the pk cannot be changed by some function but only simply referred in an operator.

A more sophisticated version may allow some column goes through some function that does not hurt the uniqueness of a pk.

st1page commented 2 years ago

how about just doing it simply: #2242 and then add a method like can_look_up_by(idx: &[usize])

fuyufjh commented 2 years ago

how will you get the primary key? I propose just analyzing only when the input is a LogicalScan. the pk_indices seems a special thing only for stream operators.

There are likely to be some project and filter on top of LogicalScan, I think we should match them also.

We may do a similar but different pk derivation for the batch plan.

Seems not necessary. On the other hand, if we support indexes, you may see that what we want here are actually indexes, not only PK.

how about just doing it simply: https://github.com/singularity-data/risingwave/issues/2343and then add a method like can_look_up_by(idx: &[usize])

Also looks okay to me

fuyufjh commented 2 years ago

FYI. A very comprehensive summary about join reordering (especially the "Greedy Heuristic" section, since we may hope it to be simple for now) -> https://db.in.tum.de/teaching/ws1415/queryopt/chapter3.pdf

keanji-x commented 2 years ago

FYI. A very comprehensive summary about join reordering (especially the "Greedy Heuristic" section, since we may hope it to be simple for now) -> https://db.in.tum.de/teaching/ws1415/queryopt/chapter3.pdf

It has a very nice textbook http://pi3.informatik.uni-mannheim.de/~moer/querycompiler.pdf

fuyufjh commented 2 years ago

@XieJiann Hi, please join us in Slack https://risingwave-community.slack.com/archives/C03FWTNVB7B

jon-chuang commented 2 years ago

how will you get the primary key? I propose just analyzing only when the input is a LogicalScan. the pk_indices seems a special thing only for stream operators.

If the pk indices are not present in the join condition, then they are irrelevant for our heuristic!