timothy-e commented 2 years ago

Jira Link: DB-1256

Description

This is probably a very low priority item, but if we want to minimize query planning feature differences between LSM trees and btrees, it needs to be done.

Postgres use the function get_actual_variable_range() to find the actual minimum and maximum of the column when accessing the first or last histogram bucket, which should yield more accurate results. It works on btree indexes only, and Postgres will use the histogram bounds otherwise.

If we want feature / plan parity between LSM trees and btrees, then we would need to update this function to also return the proper bounds for LSM trees.

Note: Until https://phabricator.dev.yugabyte.com/D14558 is merged, it also runs for LSM trees, but will yield entirely incorrect values.

Example:

First, enable the function for LSM trees: Change

/* Ignore non-btree indexes */
if (index->relam != BTREE_AM_OID)
    continue;

to

/* Ignore non-btree indexes */
if (index->relam != BTREE_AM_OID && index->relam != LSM_AM_OID)
    continue;

Then, execute the following SQL. (Requires https://phabricator.dev.yugabyte.com/D14558)

CREATE TABLE t_10000 (id int PRIMARY KEY, val int);
INSERT INTO t_10000 SELECT i, i FROM (SELECT generate_series(1, 10000) i) t;
ANALYZE;
SET yb_enable_table_statistics_for_join = TRUE;
SET yb_enable_table_statistics_for_plan = TRUE;

EXPLAIN SELECT * FROM t_10000 WHERE id < 101; -- reports 100 rows as expected because we don't need to access the first histogram bucket
EXPLAIN SELECT * FROM t_10000 WHERE id < 100; -- reports 0 rows because the binary search uses the first histogram bucket 
EXPLAIN SELECT * FROM t_10000 WHERE id > 9799; -- reports 201 rows as expected because we don't need to access the last histogram bucket
EXPLAIN SELECT * FROM t_10000 WHERE id > 9800; -- reports 0 rows because the binary search need to access the last histogram bucket

mtakahar commented 2 years ago

Actually, trying to address the zero-selectivity problem by simply following what postgres does, i.e.: implementing get_actual_variable_range, may not be a good idea because:

Getting the actual min/max values involves network communication overhead - making sure to cache the value, if not done yet, should make it less expensive however, an extra round trip per query compilation may not be negligible with a highly concurrent point query workload, etc. (especially, postgres does not share the compiled execution plans across different clients/sessions.)
Extending the first/last histogram bucket with the actual values means we are diluting the estimates only for these buckets - adding inconsistencies between those end buckets vs. all others.

Furthermore, assigning the zero-selectivity based on the approxiamted stats, as opposed to a known empty table, a predicate that always evaluates to false or unknown, is never safe thing to do. If we are to assume those out-of-the-range values would have been added since the last analyze, it would be better off just taking one-row worth selectivity, or even half a row to make the estimated row count smaller than a sure thing 1-row estimate. i.e. 0.5 / the input cardinality, etc.

timothy-e commented 2 years ago

That makes a lot of sense.

Furthermore, assigning the zero-selectivity based on the approxiamted stats, as opposed to a known empty table, a predicate that always evaluates to false or unknown, is never safe thing to do.

Does this mean that you think Postgres is doing unsafe things? Or that Postgres can safely return 0 because it works to determine the actual min/max, but since Yugabyte does not do that, Yugabyte cannot safely return 0?

mtakahar commented 2 years ago

... but since Yugabyte does not do that, Yugabyte cannot safely return 0? My short answer would be yes, it's unsafe. We'd want to try assigning a very small selectivity there and see how it goes.

Since postgres enforces the 1-row minimum here and there using clamp_row_est (another questionable practice that increases the rounding a lot for complex queries), we may not see the difference between returning 0 and very small selectivity value in many or all plans.

yugabyte / yugabyte-db

[YSQL] Update `selfuncs.c::get_actual_variable_range()` to support LSM trees #11068

Description

Example: