Open bllewell opened 3 years ago
See ybctype.c
{ INT4RANGEOID, YB_YQL_DATA_TYPE_BINARY, false, -1,
false
meaning not able to be primary key.
This seems to be related to #6606
Interesting. I wouldn't know (from the outside).
Sadly there is also no good way (that I've found) to trick yugabyte into using an index with the range operators. You can create an index on the lower()
and upper()
bounds of a range column, but then using a predicate like range @> val
doesn't use the index (which, of course it doesn't...you aren't using those functions with the range operators).
This is a real pain point if you want to speed up these kinds of queries because the workaround is to rebuild the well defined operators yourself and hope for the best. I'm using tstzrange
frequently in my set up and so this bites me every time I'm working on queries. In certain cases, the predicates required to emulate what would otherwise be a single operator can get quite lengthy and hard to verify!
create table test (
range int4range
);
create index testr
on test (lower(range) desc, upper(range) desc);
insert into test (range)
select int4range(series.A, series.A + 5)
from generate_series(0, 1000000, 5) as series(A);
explain analyse select * from test where range @> 190423;
--+------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+------------------------------------------------------------------------------------------------------+
--|Seq Scan on test (cost=0.00..102.50 rows=1000 width=32) (actual time=111.671..753.082 rows=1 loops=1)|
--| Filter: (range @> 190423) |
--| Rows Removed by Filter: 200000 |
--|Planning Time: 0.032 ms |
--|Execution Time: 753.135 ms |
--+------------------------------------------------------------------------------------------------------+
explain analyse select * from test where lower(range) <= 190423 and upper(range) > 190423;
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+--------------------------------------------------------------------------------------------------------------+
--|Index Scan using testr on test (cost=0.00..5.30 rows=10 width=32) (actual time=37.187..37.191 rows=1 loops=1)|
--| Index Cond: ((lower(range) <= 190423) AND (upper(range) > 190423)) |
--|Planning Time: 0.538 ms |
--|Execution Time: 37.251 ms |
--+--------------------------------------------------------------------------------------------------------------+
What is that like a 95% speedup for not using the operator? Easy choice.....but wait! Those aren't really equivalents, now, are they?? As far as I can currently think, this is a closer equivalent:
explain analyse select * from test
where
-- cover all 4 inc/exclusive possibilities where both sides of the range are not null
(lower_inc(range) and not upper_inc(range) and lower(range) <= 190423 and upper(range) > 190423) -- [,)
or (lower_inc(range) and upper_inc(range) and lower(range) <= 190423 and upper(range) >= 190423) -- [,]
or (not lower_inc(range) and upper_inc(range) and lower(range) < 190423 and upper(range) >= 190423) -- (,]
or (not lower_inc(range) and not upper_inc(range) and lower(range) < 190423 and upper(range) > 190423) -- (,)
-- and the 2 where lower is null
or (not upper_inc(range) and lower(range) is null and upper(range) > 190423) -- ,)
or (upper_inc(range) and lower(range) is null and upper(range) >= 190423) -- ,]
-- and the 2 where upper is null
or (not lower_inc(range) and lower(range) < 190423 and upper(range) is null) -- (,
or (lower_inc(range) and lower(range) <= 190423 and upper(range) is null) -- [,
-- and where both are null
or (lower(range) is null and upper(range) is null);
It would be funny if it wasn't so sad. Of course, this query no longer uses the index.
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Seq Scan on test (cost=0.00..205.00 rows=1000 width=32) (actual time=114.360..791.495 rows=1 loops=1) |
| Filter: ((lower_inc(range) AND (NOT upper_inc(range)) AND (lower(range) <= 190423) AND (upper(range) > 190423)) OR (lower_inc(range) AND upper_inc(range) AND (lower(range) <= 190423) AND (upper(range) >= 190423)) OR ((NOT lower_inc(range)) AND upper_inc(range) AND (lower(range) < 190423) AND (upper(range) >= 190423)) OR ((NOT lower_inc(range)) AND (NOT upper_inc(range)) AND (lower(range) < 190423) AND (upper(range) > 190423)) OR ((NOT upper_inc(range)) AND (lower(range) IS NULL) AND (upper(range) > 190423)) OR (upper_inc(range) AND (lower(range) IS NULL) AND (upper(range) >= 190423)) OR ((NOT lower_inc(range)) AND (lower(range) < 190423) AND (upper(range) IS NULL)) OR (lower_inc(range) AND (lower(range) <= 190423) AND (upper(range) IS NULL)) OR ((lower(range) IS NULL) AND (upper(range) IS NULL)))|
| Rows Removed by Filter: 200000 |
|Planning Time: 0.062 ms |
|Execution Time: 791.539 ms |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
But that's okay, it's the or
s that are preventing it so we can fix this with more sql. Is there anything that can't be fixed with more sql? HAHAaha
explain analyse
select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= 190423 and upper(range) > 190423
union all
select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= 190423 and upper(range) >= 190423
union all
select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < 190423 and upper(range) >= 190423
union all
select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < 190423 and upper(range) > 190423
union all
select * from test where not upper_inc(range) and lower(range) is null and upper(range) > 190423
union all
select * from test where upper_inc(range) and lower(range) is null and upper(range) >= 190423
union all
select * from test where not lower_inc(range) and lower(range) < 190423 and upper(range) is null
union all
select * from test where lower_inc(range) and lower(range) <= 190423 and upper(range) is null
union all
select * from test where lower(range) is null and upper(range) is null;
+---------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+---------------------------------------------------------------------------------------------------------------------------+
|Append (cost=0.00..49.50 rows=90 width=32) (actual time=32.430..184.608 rows=1 loops=1) |
| -> Index Scan using testr on test (cost=0.00..5.40 rows=10 width=32) (actual time=32.429..32.432 rows=1 loops=1) |
| Index Cond: ((lower(range) <= 190423) AND (upper(range) > 190423)) |
| Filter: (lower_inc(range) AND (NOT upper_inc(range))) |
| -> Index Scan using testr on test test_1 (cost=0.00..5.40 rows=10 width=32) (actual time=31.796..31.796 rows=0 loops=1)|
| Index Cond: ((lower(range) <= 190423) AND (upper(range) >= 190423)) |
| Filter: (lower_inc(range) AND upper_inc(range)) |
| Rows Removed by Filter: 1 |
| -> Index Scan using testr on test test_2 (cost=0.00..5.40 rows=10 width=32) (actual time=28.206..28.206 rows=0 loops=1)|
| Index Cond: ((lower(range) < 190423) AND (upper(range) >= 190423)) |
| Filter: ((NOT lower_inc(range)) AND upper_inc(range)) |
| Rows Removed by Filter: 1 |
| -> Index Scan using testr on test test_3 (cost=0.00..5.40 rows=10 width=32) (actual time=27.761..27.761 rows=0 loops=1)|
| Index Cond: ((lower(range) < 190423) AND (upper(range) > 190423)) |
| Filter: ((NOT lower_inc(range)) AND (NOT upper_inc(range))) |
| Rows Removed by Filter: 1 |
| -> Index Scan using testr on test test_4 (cost=0.00..5.33 rows=10 width=32) (actual time=0.468..0.469 rows=0 loops=1) |
| Index Cond: ((lower(range) IS NULL) AND (upper(range) > 190423)) |
| Filter: (NOT upper_inc(range)) |
| -> Index Scan using testr on test test_5 (cost=0.00..5.33 rows=10 width=32) (actual time=0.400..0.400 rows=0 loops=1) |
| Index Cond: ((lower(range) IS NULL) AND (upper(range) >= 190423)) |
| Filter: upper_inc(range) |
| -> Index Scan using testr on test test_6 (cost=0.00..5.33 rows=10 width=32) (actual time=27.423..27.423 rows=0 loops=1)|
| Index Cond: ((lower(range) < 190423) AND (upper(range) IS NULL)) |
| Filter: (NOT lower_inc(range)) |
| -> Index Scan using testr on test test_7 (cost=0.00..5.33 rows=10 width=32) (actual time=35.735..35.735 rows=0 loops=1)|
| Index Cond: ((lower(range) <= 190423) AND (upper(range) IS NULL)) |
| Filter: lower_inc(range) |
| -> Index Scan using testr on test test_8 (cost=0.00..5.25 rows=10 width=32) (actual time=0.375..0.375 rows=0 loops=1) |
| Index Cond: ((lower(range) IS NULL) AND (upper(range) IS NULL)) |
|Planning Time: 0.163 ms |
|Execution Time: 184.695 ms |
+---------------------------------------------------------------------------------------------------------------------------+
Performance took a hit but we are still running circles around the non indexed operator. Sadly this doesn't even handle the case where you are checking if a range contains another range.That's okay let's keep going
create or replace function contains(value int4)
returns setof test
language plpgsql
as
$$
begin
return query select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= value and upper(range) > value;
return query select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= value and upper(range) >= value;
return query select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < value and upper(range) >= value;
return query select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < value and upper(range) > value;
return query select * from test where not upper_inc(range) and lower(range) is null and upper(range) > value;
return query select * from test where upper_inc(range) and lower(range) is null and upper(range) >= value;
return query select * from test where not lower_inc(range) and lower(range) < value and upper(range) is null;
return query select * from test where lower_inc(range) and lower(range) <= value and upper(range) is null;
return query select * from test where lower(range) is null and upper(range) is null;
return;
end;
$$;
create or replace function contains(value int4range)
returns setof test
language plpgsql
as
$$
begin
return query select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= lower(value) and upper(range) > upper(value);
return query select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= lower(value) and upper(range) >= upper(value);
return query select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < lower(value) and upper(range) >= upper(value);
return query select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < lower(value) and upper(range) > upper(value);
return query select * from test where not upper_inc(range) and lower(range) is null and upper(range) > upper(value);
return query select * from test where upper_inc(range) and lower(range) is null and upper(range) >= upper(value);
return query select * from test where not lower_inc(range) and lower(range) < lower(value) and upper(range) is null;
return query select * from test where lower_inc(range) and lower(range) <= lower(value) and upper(range) is null;
return query select * from test where lower(range) is null and upper(range) is null;
return;
end;
$$;
explain analyse
select * from contains(190423);
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains (cost=0.25..10.25 rows=1000 width=32) (actual time=178.591..178.592 rows=1 loops=1)|
--|Planning Time: 0.022 ms |
--|Execution Time: 178.604 ms |
--+--------------------------------------------------------------------------------------------------------------+
explain analyse
select * from contains(int4range(190420, 190424, '[)'));
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains (cost=0.25..10.25 rows=1000 width=32) (actual time=173.604..173.604 rows=1 loops=1)|
--|Planning Time: 0.042 ms |
--|Execution Time: 173.621 ms |
--+--------------------------------------------------------------------------------------------------------------+
Looking good except that you'd have to create one per table per range. Yuck!
create or replace function contains(tbl anyelement, range text, value int4)
returns setof anyelement
language plpgsql
as
$$
declare
r text;
begin
r := quote_ident(range);
return query execute format('select * from %s where lower_inc(%s) and not upper_inc(%s) and lower(%s) <= %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, value, r, value);
return query execute format('select * from %s where lower_inc(%s) and upper_inc(%s) and lower(%s) <= %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value, r, value);
return query execute format('select * from %s where not lower_inc(%s) and upper_inc(%s) and lower(%s) < %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value, r, value);
return query execute format('select * from %s where not lower_inc(%s) and not upper_inc(%s) and lower(%s) < %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, value, r, value);
return query execute format('select * from %s where not upper_inc(%s) and lower(%s) is null and upper(%s) > %L', pg_typeof(tbl), r, r, r, value);
return query execute format('select * from %s where upper_inc(%s) and lower(%s) is null and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value);
return query execute format('select * from %s where not lower_inc(%s) and lower(%s) < %L and upper(%s) is null', pg_typeof(tbl), r, r, value, r);
return query execute format('select * from %s where lower_inc(%s) and lower(%s) <= %L and upper(%s) is null', pg_typeof(tbl), r, r, value, r);
return query execute format('select * from %s where lower(%s) is null and upper(%s) is null', pg_typeof(tbl), r, r);
return;
end;
$$;
create or replace function contains(tbl anyelement, range text, value int4range)
returns setof anyelement
language plpgsql
as
$$
declare
r text;
begin
r := quote_ident(range);
return query execute format('select * from %s where lower_inc(%s) and not upper_inc(%s) and lower(%s) <= %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
return query execute format('select * from %s where lower_inc(%s) and upper_inc(%s) and lower(%s) <= %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
return query execute format('select * from %s where not lower_inc(%s) and upper_inc(%s) and lower(%s) < %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
return query execute format('select * from %s where not lower_inc(%s) and not upper_inc(%s) and lower(%s) < %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
return query execute format('select * from %s where not upper_inc(%s) and lower(%s) is null and upper(%s) > %L', pg_typeof(tbl), r, r, r, upper(value));
return query execute format('select * from %s where upper_inc(%s) and lower(%s) is null and upper(%s) >= %L', pg_typeof(tbl), r, r, r, upper(value));
return query execute format('select * from %s where not lower_inc(%s) and lower(%s) < %L and upper(%s) is null', pg_typeof(tbl), r, r, lower(value), r);
return query execute format('select * from %s where lower_inc(%s) and lower(%s) <= %L and upper(%s) is null', pg_typeof(tbl), r, r, lower(value), r);
return query execute format('select * from %s where lower(%s) is null and upper(%s) is null', pg_typeof(tbl), r, r);
return;
end;
$$;
explain analyse
select * from contains(null::test, 'range', 190423);
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains (cost=0.25..10.25 rows=1000 width=32) (actual time=236.467..236.468 rows=1 loops=1)|
--|Planning Time: 0.022 ms |
--|Execution Time: 236.508 ms |
--+--------------------------------------------------------------------------------------------------------------+
explain analyse
select * from contains(null::test, 'range', int4range(190420, 190424, '[)'));
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains (cost=0.25..10.25 rows=1000 width=32) (actual time=224.615..224.616 rows=1 loops=1)|
--|Planning Time: 0.031 ms |
--|Execution Time: 224.628 ms |
--+--------------------------------------------------------------------------------------------------------------+
And the includes operator is probably the easiest one to implement! :crying_cat_face:
When yuga is able to index range queries send me a postcard at the Mountains of Madness where me and the ghost of Richard Snodgrass will be working on hand written range queries
@EmiPhil, thanks for doing this study!
Jira Link: DB-2563 Creating this for tracking purposes.
Observed using single-node YB-2.4.0.0 cluster on macOS.
The "create index" attempt causes the 0A000 error. (This is mapped to the
feature_not_supported
exception.) The error text is "INDEX on column of type 'INT4RANGE' not yet supported". Corresponding attempts using the other range data types (int8range
,numrange
,tsrange
,tstzrange
, anddaterange
) all cause the 0A000 error with data-type-specific error texts.)Similarly, this attempt (and for the other range data types):
also cause the the 0A000 error.
All of these examples run without error on vanilla Postgress 11.2.