timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.87k stars 853 forks source link

Do proper Path planning for upper rels in multi-node queries #3807

Closed erimatnor closed 1 month ago

erimatnor commented 2 years ago

Currently, some multi-node push-down optimizations, such as LIMIT and DISTINCT, are applied at the deparsing stage, rather than first being planned as Upper relation Paths on the data node scan relations.

One reason for this is that earlier PostgreSQL versions didn't have support in the foreign data wrapper (and API) to generate paths for Upper relation kinds.

To properly implement push-down plans, we should generate the appropriate Paths (including their cost) when push-down is requested via partitionwise planning via the following UpperRelationKinds:

Future push-down optimizations might also involve upper relations like UPPERREL_WINDOW for pushing down windowing functions.

Once these upperrel Paths are created, the planner will generate the least cost plan, and generating the remote SQL statement will simply be the task of deparsing that plan (for the data node scan) into a SQL statement.

Note that adding the proper Paths is necessary for the planner on the access node to understand what the properties are of the data coming back from data nodes. For instance, in the case of DISTINCT ON(space_key) push down, the combined result from all data nodes is already distinct. However, without exposing this in the plan, the access node will always believe it has to execute a Unique node on the access node again.

Also note that in recent versions of the postgres_fdw some support for ordering and final (LIMIT) push down has been added, so it can provide a blueprint for how to do it. However, it needs to be adapted for the multi-node case. For instance, both DISTINCT and LIMIT push down have different implications when pushing down to multiple nodes compared to a single remote node.

erimatnor commented 2 years ago

Related issue: https://github.com/timescale/timescaledb/issues/3804

nikkhils commented 2 years ago

The costing code has been updated to use the latest from PG14 fdw code base.

However, the create_partitionwise_grouping_paths function sets the fdw_private field of the child_grouped_rel variable and this is NOT carried to the main UPPER grouped_rel. Because of this issue when we reach the ORDER BY part in create_ordered_paths, this grouped_rel does not have fdw_private set. So, when it becomes the input rel for the timescale_create_upper_paths_hook function with stage=UPPERREL_ORDERED we cannot do much because we do not know if this needs to be pushed down to the datanodes. This is a big issue that I am facing currently.

jfjoly commented 2 years ago

@nikkhils could you post an update or a link to the current state as we will move to the elasticity priority this week?

nikkhils commented 2 years ago

Here are the latest updates:

PR #4119 has been converted into a DRAFT for now. We don't need those changes right away.

PR #4152 has been created to add initial support for handling UPPER RELS in our code base. However, further testing proves that some more stabilization and debugging is required to get this is proper shape.

One can set total_cost to an arbitrary value like 0.1 in both add_foreign_ordered_paths and add_foreign_final_paths functions after the cost has been estimated in this PR. This will allow our custom datanode scan path to be chosen for one DN query cases. However, set_plan_references PANICS when the actual plan is being set up at the very end of the planning. It basically does not find the var references appropriately. So this is again which some needs investigation. But I believe we are moving in the right direction for this optimization with this approach. My plan is spend some time on the side whenever I can while focusing on the rest of the immediate work items right now.

NOTE:

Even for the case where we do not set enable_partitionwise_aggregate, we should be able to optimize if only ONE datanode is involved in the query. If there's no local AN dependency then the entire query can be sent to the remote DN and its execution results be returned back to the AN. However this is easier said than done and will need careful changes in our code base.

jfjoly commented 2 years ago

We discussed it and decided to pause the work on this to focus on elasticity.

svenklemm commented 1 month ago

multinode has been removed