Improve costing model for push down to data nodes

erimatnor commented 3 years ago

What

There aren't many cases where pushing something down to data nodes isn't desirable. Even if the net benefit is zero (local and remote cost is equal), push-down should be preferred in order to offload processing to data nodes.

There are many areas to improve here:

Sort push down
LIMIT push down (currently, we don't account for LIMIT push down when estimating remote cost)
Aggregate/Grouping push down

In particular, sorts aren't always pushed down to data nodes since the planner's costing model isn't great for this use case.

Why

The planner currently has no concept of asynchronous execution of queries across data nodes. Instead, it believes the cost of a sort on the access node is about the same as a dividing the sorts across N data nodes. To illustrate this in a simplified way, imagine that the cost of sorting one tuple is S and there are N tuples, then the total cost is:

Total cost(sort) = S*N

If there are two data nodes, and we sort half of the tuples on each node, then the cost is:

Total cost(sort) = S*N/2 + S*N/2 = S*N

Thus, the cost is the same irrespective of whether we sort on the access node or on data nodes. There is therefore no obvious reason for the planner to pick the push-down case. Note that sorting doesn't affect the number of tuples sorted so the transfer cost is the same no matter whether the tuples are transferred in-order or out-of-order.

One reason for the failure of the current costing model is that it doesn't account for parallelism. A query can have a cost in both CPU and latency. While the cost in CPU is essentially constant, the cost in latency is not. In other words, the number of CPU cycles spent sorting on the AN vs dividing it across N data nodes is roughly the same (as the example shows above), but in terms of latency it is not. If we would account for only latency, then the cost would be S*N/#datanodes (assuming perfect parallelism and equal number of tuples on each node).

How

Proposal: Break this into smaller subtasks:

[ ] Come up with some sample queries that don't work well and are consistently bad.
[ ] Write some type of document with example queries
[ ] Propose costing models that would improve example queries.
[ ] Break up into sub-issues to implement.

Notes

Here are a couple of ideas on how to improve the cost model:

Look at how the planner deals with costing for parallel plans.
Naively divide the sorting cost by number of data nodes
Add a constant the "rewards" async execution (e.g., multiply the final sort cost with 0.8 to reduce the cost by 20%)

References

erimatnor commented 3 years ago

Possibly related issue: https://github.com/timescale/timescaledb/issues/2733

nikkhils commented 3 years ago

So first impressions:

The ORDER BY clauses are being pushed down to the datanodes as required in most queries
The LIMIT clause is being pushed down to the datanodes as required in most queries

Are there any known cases where the sorts are not being pushed down to the data nodes?

I looked at #2733 but it looks more like the hypertables are compressed on the datanode. Additionally from the plan it appears that parallel seqscan is being chosen which will be very slow. One reason for seqscan being chosen is that the sort column in the query is neither a part of timescaledb.compress_orderby nor timescaledb.compress_segmentby. I have asked for clarifications on this from the reporter.

nikkhils commented 3 years ago

@ryanbooz please let us know if you have feedback here as well

erimatnor commented 2 years ago

In later PostgreSQL versions, a FDW is asked to compute paths for ORDER (UPPERREL_ORDERED) and LIMIT (UPPERREL_FINAL) cases. These weren't available in PG11's postgres_fdw, upon which much of the costing and push down is based in TimescaleDB.

We might want to consider supporting these upperrel computations once we've dropped support for all PG versions that don't support these remote upperrel types.

erimatnor commented 2 years ago

Proposal: Break this into smaller subtasks:

Come up with some sample queries that don't work well and are consistently bad.
Write some type of document with example queries
Propose costing models that would improve example queries.
Break up into subissues to implement.

svenklemm commented 1 month ago

multinode has been removed

timescale / timescaledb