Closed erimatnor closed 1 month ago
Possibly related issue: https://github.com/timescale/timescaledb/issues/2733
So first impressions:
ORDER BY
clauses are being pushed down to the datanodes as required in most queriesLIMIT
clause is being pushed down to the datanodes as required in most queriesAre there any known
cases where the sorts are not being pushed down to the data nodes?
I looked at #2733 but it looks more like the hypertables are compressed on the datanode. Additionally from the plan it appears that parallel seqscan is being chosen which will be very slow. One reason for seqscan being chosen is that the sort column in the query is neither a part of timescaledb.compress_orderby
nor timescaledb.compress_segmentby
. I have asked for clarifications on this from the reporter.
@ryanbooz please let us know if you have feedback here as well
In later PostgreSQL versions, a FDW is asked to compute paths for ORDER (UPPERREL_ORDERED
) and LIMIT (UPPERREL_FINAL
) cases. These weren't available in PG11's postgres_fdw, upon which much of the costing and push down is based in TimescaleDB.
We might want to consider supporting these upperrel computations once we've dropped support for all PG versions that don't support these remote upperrel types.
Proposal: Break this into smaller subtasks:
multinode has been removed
What
There aren't many cases where pushing something down to data nodes isn't desirable. Even if the net benefit is zero (local and remote cost is equal), push-down should be preferred in order to offload processing to data nodes.
There are many areas to improve here:
In particular, sorts aren't always pushed down to data nodes since the planner's costing model isn't great for this use case.
Why
The planner currently has no concept of asynchronous execution of queries across data nodes. Instead, it believes the cost of a sort on the access node is about the same as a dividing the sorts across N data nodes. To illustrate this in a simplified way, imagine that the cost of sorting one tuple is S and there are N tuples, then the total cost is:
If there are two data nodes, and we sort half of the tuples on each node, then the cost is:
Thus, the cost is the same irrespective of whether we sort on the access node or on data nodes. There is therefore no obvious reason for the planner to pick the push-down case. Note that sorting doesn't affect the number of tuples sorted so the transfer cost is the same no matter whether the tuples are transferred in-order or out-of-order.
One reason for the failure of the current costing model is that it doesn't account for parallelism. A query can have a cost in both CPU and latency. While the cost in CPU is essentially constant, the cost in latency is not. In other words, the number of CPU cycles spent sorting on the AN vs dividing it across N data nodes is roughly the same (as the example shows above), but in terms of latency it is not. If we would account for only latency, then the cost would be
S*N/#datanodes
(assuming perfect parallelism and equal number of tuples on each node).How
Proposal: Break this into smaller subtasks:
Notes
Here are a couple of ideas on how to improve the cost model:
References