Open timothy-e opened 2 years ago
Actually, trying to address the zero-selectivity problem by simply following what postgres does, i.e.: implementing get_actual_variable_range, may not be a good idea because:
Furthermore, assigning the zero-selectivity based on the approxiamted stats, as opposed to a known empty table, a predicate that always evaluates to false or unknown, is never safe thing to do. If we are to assume those out-of-the-range values would have been added since the last analyze, it would be better off just taking one-row worth selectivity, or even half a row to make the estimated row count smaller than a sure thing 1-row estimate. i.e. 0.5 / the input cardinality, etc.
That makes a lot of sense.
Furthermore, assigning the zero-selectivity based on the approxiamted stats, as opposed to a known empty table, a predicate that always evaluates to false or unknown, is never safe thing to do.
Does this mean that you think Postgres is doing unsafe things? Or that Postgres can safely return 0 because it works to determine the actual min/max, but since Yugabyte does not do that, Yugabyte cannot safely return 0?
... but since Yugabyte does not do that, Yugabyte cannot safely return 0? My short answer would be yes, it's unsafe. We'd want to try assigning a very small selectivity there and see how it goes.
Since postgres enforces the 1-row minimum here and there using clamp_row_est
(another questionable practice that increases the rounding a lot for complex queries), we may not see the difference between returning 0 and very small selectivity value in many or all plans.
Jira Link: DB-1256
Description
This is probably a very low priority item, but if we want to minimize query planning feature differences between LSM trees and btrees, it needs to be done.
Postgres use the function
get_actual_variable_range()
to find the actual minimum and maximum of the column when accessing the first or last histogram bucket, which should yield more accurate results. It works on btree indexes only, and Postgres will use the histogram bounds otherwise.If we want feature / plan parity between LSM trees and btrees, then we would need to update this function to also return the proper bounds for LSM trees.
Note: Until https://phabricator.dev.yugabyte.com/D14558 is merged, it also runs for LSM trees, but will yield entirely incorrect values.
Example:
First, enable the function for LSM trees: Change
to
Then, execute the following SQL. (Requires https://phabricator.dev.yugabyte.com/D14558)