Closed erimatnor closed 1 month ago
Related issue: https://github.com/timescale/timescaledb/issues/3804
The costing code has been updated to use the latest from PG14 fdw code base.
However, the create_partitionwise_grouping_paths
function sets the fdw_private field of the child_grouped_rel
variable and this is NOT carried to the main UPPER grouped_rel. Because of this issue when we reach the ORDER BY part in create_ordered_paths
, this grouped_rel does not have fdw_private
set. So, when it becomes the input rel for the timescale_create_upper_paths_hook
function with stage=UPPERREL_ORDERED
we cannot do much because we do not know if this needs to be pushed down to the datanodes. This is a big issue that I am facing currently.
@nikkhils could you post an update or a link to the current state as we will move to the elasticity priority this week?
Here are the latest updates:
PR #4119 has been converted into a DRAFT for now. We don't need those changes right away.
PR #4152 has been created to add initial support for handling UPPER RELS in our code base. However, further testing proves that some more stabilization and debugging is required to get this is proper shape.
One can set total_cost
to an arbitrary value like 0.1
in both add_foreign_ordered_paths
and add_foreign_final_paths
functions after the cost has been estimated in this PR. This will allow our custom datanode scan path to be chosen for one DN query cases. However, set_plan_references
PANICS when the actual plan is being set up at the very end of the planning. It basically does not find the var references appropriately. So this is again which some needs investigation. But I believe we are moving in the right direction for this optimization with this approach. My plan is spend some time on the side whenever I can while focusing on the rest of the immediate work items right now.
Even for the case where we do not set enable_partitionwise_aggregate
, we should be able to optimize if only ONE datanode is involved in the query. If there's no local AN dependency then the entire query can be sent to the remote DN and its execution results be returned back to the AN. However this is easier said than done and will need careful changes in our code base.
We discussed it and decided to pause the work on this to focus on elasticity.
multinode has been removed
Currently, some multi-node push-down optimizations, such as LIMIT and DISTINCT, are applied at the deparsing stage, rather than first being planned as Upper relation Paths on the data node scan relations.
One reason for this is that earlier PostgreSQL versions didn't have support in the foreign data wrapper (and API) to generate paths for Upper relation kinds.
To properly implement push-down plans, we should generate the appropriate Paths (including their cost) when push-down is requested via partitionwise planning via the following
UpperRelationKind
s:UPPERREL_DISTINCT
- for pushing down distinct queriesUPPERREL_ORDERING
- for pushing down sortsUPPERREL_FINAL
- for pushing down things likeLIMIT
andOFFSET
Future push-down optimizations might also involve upper relations like
UPPERREL_WINDOW
for pushing down windowing functions.Once these upperrel Paths are created, the planner will generate the least cost plan, and generating the remote SQL statement will simply be the task of deparsing that plan (for the data node scan) into a SQL statement.
Note that adding the proper Paths is necessary for the planner on the access node to understand what the properties are of the data coming back from data nodes. For instance, in the case of
DISTINCT ON(space_key)
push down, the combined result from all data nodes is already distinct. However, without exposing this in the plan, the access node will always believe it has to execute aUnique
node on the access node again.Also note that in recent versions of the
postgres_fdw
some support for ordering and final (LIMIT) push down has been added, so it can provide a blueprint for how to do it. However, it needs to be adapted for the multi-node case. For instance, both DISTINCT and LIMIT push down have different implications when pushing down to multiple nodes compared to a single remote node.