yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
9.01k stars 1.07k forks source link

[ysql] Cannot create index on "range" column. Cannot include "range" column in PK constraint. #7353

Open bllewell opened 3 years ago

bllewell commented 3 years ago

Jira Link: DB-2563 Creating this for tracking purposes.

Observed using single-node YB-2.4.0.0 cluster on macOS.

\set VERBOSITY verbose
create table t(k int primary key, r int4range not null);
create index t_r1 on t(r);

The "create index" attempt causes the 0A000 error. (This is mapped to the feature_not_supported exception.) The error text is "INDEX on column of type 'INT4RANGE' not yet supported". Corresponding attempts using the other range data types (int8range, numrange, tsrange, tstzrange, and daterange) all cause the 0A000 error with data-type-specific error texts.)

Similarly, this attempt (and for the other range data types):

create table t(r int4range primary key);

also cause the the 0A000 error.

All of these examples run without error on vanilla Postgress 11.2.

jaki commented 3 years ago

See ybctype.c

    { INT4RANGEOID, YB_YQL_DATA_TYPE_BINARY, false, -1,

false meaning not able to be primary key.

tedyu commented 3 years ago

This seems to be related to #6606

bllewell commented 3 years ago

Interesting. I wouldn't know (from the outside).

EmiPhil commented 3 years ago

Sadly there is also no good way (that I've found) to trick yugabyte into using an index with the range operators. You can create an index on the lower() and upper() bounds of a range column, but then using a predicate like range @> val doesn't use the index (which, of course it doesn't...you aren't using those functions with the range operators).

This is a real pain point if you want to speed up these kinds of queries because the workaround is to rebuild the well defined operators yourself and hope for the best. I'm using tstzrange frequently in my set up and so this bites me every time I'm working on queries. In certain cases, the predicates required to emulate what would otherwise be a single operator can get quite lengthy and hard to verify!

create table test (
    range int4range
);

create index testr
on test (lower(range) desc, upper(range) desc);

insert into test (range)
    select int4range(series.A, series.A + 5)
    from generate_series(0, 1000000, 5) as series(A);

explain analyse select * from test where range @> 190423;
--+------------------------------------------------------------------------------------------------------+
--|QUERY PLAN                                                                                            |
--+------------------------------------------------------------------------------------------------------+
--|Seq Scan on test  (cost=0.00..102.50 rows=1000 width=32) (actual time=111.671..753.082 rows=1 loops=1)|
--|  Filter: (range @> 190423)                                                                           |
--|  Rows Removed by Filter: 200000                                                                      |
--|Planning Time: 0.032 ms                                                                               |
--|Execution Time: 753.135 ms                                                                            |
--+------------------------------------------------------------------------------------------------------+

explain analyse select * from test where lower(range) <= 190423 and upper(range) > 190423;
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN                                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+
--|Index Scan using testr on test  (cost=0.00..5.30 rows=10 width=32) (actual time=37.187..37.191 rows=1 loops=1)|
--|  Index Cond: ((lower(range) <= 190423) AND (upper(range) > 190423))                                          |
--|Planning Time: 0.538 ms                                                                                       |
--|Execution Time: 37.251 ms                                                                                     |
--+--------------------------------------------------------------------------------------------------------------+

What is that like a 95% speedup for not using the operator? Easy choice.....but wait! Those aren't really equivalents, now, are they?? As far as I can currently think, this is a closer equivalent:

explain analyse select * from test
where
      -- cover all 4 inc/exclusive possibilities where both sides of the range are not null
     (lower_inc(range) and not upper_inc(range) and lower(range) <= 190423 and upper(range) > 190423)          -- [,)
or   (lower_inc(range) and upper_inc(range) and lower(range) <= 190423 and upper(range) >= 190423)             -- [,]
or   (not lower_inc(range) and upper_inc(range) and lower(range) < 190423 and upper(range) >= 190423)          -- (,]
or   (not lower_inc(range) and not upper_inc(range) and lower(range) < 190423 and upper(range) > 190423)       -- (,)
      -- and the 2 where lower is null
or   (not upper_inc(range) and lower(range) is null and upper(range) > 190423)                                 -- ,)
or   (upper_inc(range) and lower(range) is null and upper(range) >= 190423)                                    -- ,]
      -- and the 2 where upper is null
or   (not lower_inc(range) and lower(range) < 190423 and upper(range) is null)                                 -- (,
or   (lower_inc(range) and lower(range) <= 190423 and upper(range) is null)                                    -- [,
     -- and where both are null
or   (lower(range) is null and upper(range) is null);

It would be funny if it wasn't so sad. Of course, this query no longer uses the index.

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Seq Scan on test  (cost=0.00..205.00 rows=1000 width=32) (actual time=114.360..791.495 rows=1 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|  Filter: ((lower_inc(range) AND (NOT upper_inc(range)) AND (lower(range) <= 190423) AND (upper(range) > 190423)) OR (lower_inc(range) AND upper_inc(range) AND (lower(range) <= 190423) AND (upper(range) >= 190423)) OR ((NOT lower_inc(range)) AND upper_inc(range) AND (lower(range) < 190423) AND (upper(range) >= 190423)) OR ((NOT lower_inc(range)) AND (NOT upper_inc(range)) AND (lower(range) < 190423) AND (upper(range) > 190423)) OR ((NOT upper_inc(range)) AND (lower(range) IS NULL) AND (upper(range) > 190423)) OR (upper_inc(range) AND (lower(range) IS NULL) AND (upper(range) >= 190423)) OR ((NOT lower_inc(range)) AND (lower(range) < 190423) AND (upper(range) IS NULL)) OR (lower_inc(range) AND (lower(range) <= 190423) AND (upper(range) IS NULL)) OR ((lower(range) IS NULL) AND (upper(range) IS NULL)))|
|  Rows Removed by Filter: 200000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|Planning Time: 0.062 ms                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|Execution Time: 791.539 ms                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

But that's okay, it's the ors that are preventing it so we can fix this with more sql. Is there anything that can't be fixed with more sql? HAHAaha

explain analyse
select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= 190423 and upper(range) > 190423
    union all
select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= 190423 and upper(range) >= 190423
    union all
select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < 190423 and upper(range) >= 190423
    union all
select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < 190423 and upper(range) > 190423
    union all
select * from test where not upper_inc(range) and lower(range) is null and upper(range) > 190423
    union all
select * from test where upper_inc(range) and lower(range) is null and upper(range) >= 190423
    union all
select * from test where not lower_inc(range) and lower(range) < 190423 and upper(range) is null
    union all
select * from test where lower_inc(range) and lower(range) <= 190423 and upper(range) is null
    union all
select * from test where lower(range) is null and upper(range) is null;
+---------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------+
|Append  (cost=0.00..49.50 rows=90 width=32) (actual time=32.430..184.608 rows=1 loops=1)                                   |
|  ->  Index Scan using testr on test  (cost=0.00..5.40 rows=10 width=32) (actual time=32.429..32.432 rows=1 loops=1)       |
|        Index Cond: ((lower(range) <= 190423) AND (upper(range) > 190423))                                                 |
|        Filter: (lower_inc(range) AND (NOT upper_inc(range)))                                                              |
|  ->  Index Scan using testr on test test_1  (cost=0.00..5.40 rows=10 width=32) (actual time=31.796..31.796 rows=0 loops=1)|
|        Index Cond: ((lower(range) <= 190423) AND (upper(range) >= 190423))                                                |
|        Filter: (lower_inc(range) AND upper_inc(range))                                                                    |
|        Rows Removed by Filter: 1                                                                                          |
|  ->  Index Scan using testr on test test_2  (cost=0.00..5.40 rows=10 width=32) (actual time=28.206..28.206 rows=0 loops=1)|
|        Index Cond: ((lower(range) < 190423) AND (upper(range) >= 190423))                                                 |
|        Filter: ((NOT lower_inc(range)) AND upper_inc(range))                                                              |
|        Rows Removed by Filter: 1                                                                                          |
|  ->  Index Scan using testr on test test_3  (cost=0.00..5.40 rows=10 width=32) (actual time=27.761..27.761 rows=0 loops=1)|
|        Index Cond: ((lower(range) < 190423) AND (upper(range) > 190423))                                                  |
|        Filter: ((NOT lower_inc(range)) AND (NOT upper_inc(range)))                                                        |
|        Rows Removed by Filter: 1                                                                                          |
|  ->  Index Scan using testr on test test_4  (cost=0.00..5.33 rows=10 width=32) (actual time=0.468..0.469 rows=0 loops=1)  |
|        Index Cond: ((lower(range) IS NULL) AND (upper(range) > 190423))                                                   |
|        Filter: (NOT upper_inc(range))                                                                                     |
|  ->  Index Scan using testr on test test_5  (cost=0.00..5.33 rows=10 width=32) (actual time=0.400..0.400 rows=0 loops=1)  |
|        Index Cond: ((lower(range) IS NULL) AND (upper(range) >= 190423))                                                  |
|        Filter: upper_inc(range)                                                                                           |
|  ->  Index Scan using testr on test test_6  (cost=0.00..5.33 rows=10 width=32) (actual time=27.423..27.423 rows=0 loops=1)|
|        Index Cond: ((lower(range) < 190423) AND (upper(range) IS NULL))                                                   |
|        Filter: (NOT lower_inc(range))                                                                                     |
|  ->  Index Scan using testr on test test_7  (cost=0.00..5.33 rows=10 width=32) (actual time=35.735..35.735 rows=0 loops=1)|
|        Index Cond: ((lower(range) <= 190423) AND (upper(range) IS NULL))                                                  |
|        Filter: lower_inc(range)                                                                                           |
|  ->  Index Scan using testr on test test_8  (cost=0.00..5.25 rows=10 width=32) (actual time=0.375..0.375 rows=0 loops=1)  |
|        Index Cond: ((lower(range) IS NULL) AND (upper(range) IS NULL))                                                    |
|Planning Time: 0.163 ms                                                                                                    |
|Execution Time: 184.695 ms                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------+

Performance took a hit but we are still running circles around the non indexed operator. Sadly this doesn't even handle the case where you are checking if a range contains another range.That's okay let's keep going

create or replace function contains(value int4)
    returns setof test
    language plpgsql
as
$$
begin
    return query select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= value and upper(range) > value;
    return query select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= value and upper(range) >= value;
    return query select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < value and upper(range) >= value;
    return query select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < value and upper(range) > value;
    return query select * from test where not upper_inc(range) and lower(range) is null and upper(range) > value;
    return query select * from test where upper_inc(range) and lower(range) is null and upper(range) >= value;
    return query select * from test where not lower_inc(range) and lower(range) < value and upper(range) is null;
    return query select * from test where lower_inc(range) and lower(range) <= value and upper(range) is null;
    return query select * from test where lower(range) is null and upper(range) is null;
    return;
end;
$$;
create or replace function contains(value int4range)
    returns setof test
    language plpgsql
as
$$
begin
    return query select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= lower(value) and upper(range) > upper(value);
    return query select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= lower(value) and upper(range) >= upper(value);
    return query select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < lower(value) and upper(range) >= upper(value);
    return query select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < lower(value) and upper(range) > upper(value);
    return query select * from test where not upper_inc(range) and lower(range) is null and upper(range) > upper(value);
    return query select * from test where upper_inc(range) and lower(range) is null and upper(range) >= upper(value);
    return query select * from test where not lower_inc(range) and lower(range) < lower(value) and upper(range) is null;
    return query select * from test where lower_inc(range) and lower(range) <= lower(value) and upper(range) is null;
    return query select * from test where lower(range) is null and upper(range) is null;
    return;
end;
$$;
explain analyse
select * from contains(190423);
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN                                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains  (cost=0.25..10.25 rows=1000 width=32) (actual time=178.591..178.592 rows=1 loops=1)|
--|Planning Time: 0.022 ms                                                                                       |
--|Execution Time: 178.604 ms                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+

explain analyse
select * from contains(int4range(190420, 190424, '[)'));
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN                                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains  (cost=0.25..10.25 rows=1000 width=32) (actual time=173.604..173.604 rows=1 loops=1)|
--|Planning Time: 0.042 ms                                                                                       |
--|Execution Time: 173.621 ms                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+

Looking good except that you'd have to create one per table per range. Yuck!

create or replace function contains(tbl anyelement, range text, value int4)
    returns setof anyelement
    language plpgsql
as
$$
declare
    r text;
begin
    r := quote_ident(range);
    return query execute format('select * from %s where lower_inc(%s) and not upper_inc(%s) and lower(%s) <= %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, value, r, value);
    return query execute format('select * from %s where lower_inc(%s) and upper_inc(%s) and lower(%s) <= %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value, r, value);
    return query execute format('select * from %s where not lower_inc(%s) and upper_inc(%s) and lower(%s) < %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value, r, value);
    return query execute format('select * from %s where not lower_inc(%s) and not upper_inc(%s) and lower(%s) < %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, value, r, value);
    return query execute format('select * from %s where not upper_inc(%s) and lower(%s) is null and upper(%s) > %L', pg_typeof(tbl), r, r, r, value);
    return query execute format('select * from %s where upper_inc(%s) and lower(%s) is null and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value);
    return query execute format('select * from %s where not lower_inc(%s) and lower(%s) < %L and upper(%s) is null', pg_typeof(tbl), r, r, value, r);
    return query execute format('select * from %s where lower_inc(%s) and lower(%s) <= %L and upper(%s) is null', pg_typeof(tbl), r, r, value, r);
    return query execute format('select * from %s where lower(%s) is null and upper(%s) is null', pg_typeof(tbl), r, r);
    return;
end;
$$;
create or replace function contains(tbl anyelement, range text, value int4range)
    returns setof anyelement
    language plpgsql
as
$$
declare
    r text;
begin
    r := quote_ident(range);
    return query execute format('select * from %s where lower_inc(%s) and not upper_inc(%s) and lower(%s) <= %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
    return query execute format('select * from %s where lower_inc(%s) and upper_inc(%s) and lower(%s) <= %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
    return query execute format('select * from %s where not lower_inc(%s) and upper_inc(%s) and lower(%s) < %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
    return query execute format('select * from %s where not lower_inc(%s) and not upper_inc(%s) and lower(%s) < %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
    return query execute format('select * from %s where not upper_inc(%s) and lower(%s) is null and upper(%s) > %L', pg_typeof(tbl), r, r, r, upper(value));
    return query execute format('select * from %s where upper_inc(%s) and lower(%s) is null and upper(%s) >= %L', pg_typeof(tbl), r, r, r, upper(value));
    return query execute format('select * from %s where not lower_inc(%s) and lower(%s) < %L and upper(%s) is null', pg_typeof(tbl), r, r, lower(value), r);
    return query execute format('select * from %s where lower_inc(%s) and lower(%s) <= %L and upper(%s) is null', pg_typeof(tbl), r, r, lower(value), r);
    return query execute format('select * from %s where lower(%s) is null and upper(%s) is null', pg_typeof(tbl), r, r);
    return;
end;
$$;

explain analyse
select * from contains(null::test, 'range', 190423);
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN                                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains  (cost=0.25..10.25 rows=1000 width=32) (actual time=236.467..236.468 rows=1 loops=1)|
--|Planning Time: 0.022 ms                                                                                       |
--|Execution Time: 236.508 ms                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+

explain analyse
select * from contains(null::test, 'range', int4range(190420, 190424, '[)'));
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN                                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains  (cost=0.25..10.25 rows=1000 width=32) (actual time=224.615..224.616 rows=1 loops=1)|
--|Planning Time: 0.031 ms                                                                                       |
--|Execution Time: 224.628 ms                                                                                    |
--+--------------------------------------------------------------------------------------------------------------+

And the includes operator is probably the easiest one to implement! :crying_cat_face:

When yuga is able to index range queries send me a postcard at the Mountains of Madness where me and the ghost of Richard Snodgrass will be working on hand written range queries

bllewell commented 3 years ago

@EmiPhil, thanks for doing this study!