Open svenklemm opened 3 years ago
-- index (a, b, time desc)
select distinct on (a, b) a, b, time from tbl where a in ('val1','val2') order by a, b, time desc;
The above is what we are talking about, right? Any estimate regarding when this feature will be added? Any workarounds in the meantime? Note that I cannot make a column combining a and b because I want to filter on a with the index.
Table "public.tbl4"
Column | Type | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
col | text | | not null |
col2 | text | | not null |
col3 | timestamp with time zone | | not null |
Indexes:
"tbl4_col_col2_col3_idx" btree (col, col2, col3 DESC)
with recursive cte(c1,c2,c3) as (
(select distinct on (col) col, col2, col3 from tbl4) -- parens required
union all
select * from (
with tmp(c) as (select c2 from cte limit 1) -- can't use cte in subquery directly
select distinct on (tbl4.col) tbl4.col, tbl4.col2, tbl4.col3 from tbl4
where tbl4.col=any(array(select distinct on (col) col from tbl4))
and tbl4.col2>(select c from tmp)
) t
)
select * from cte;
One try at mimic of this feature... depends on every value of col2 existing for each value of col
You can emulate this with lateral join like so:
SELECT * FROM (SELECT DISTINCT ON (dev) dev FROM <table>) a, LATERAL (SELECT DISTINCT ON (time) dev, time FROM <table> WHERE dev = a.dev) b;
You can emulate this with lateral join like so
@svenklemm Thanks for the tip. Can you put this in terms of col/col2/col3 above? I only see two columns referenced... time
and dev
EDIT... it is almost as if I need the distinct on (col)
to provide a list of starting records... if there are N of them, this would start N recursive CTEs to consume all of col2 for each col value? Can a recursive CTE be the rhs of a lateral?
EDIT: this has too many columns from a and doesn't quite seem to work out yet...
select * from
(select distinct on (col) col, col2, col3 from tbl7) a,
lateral (
with recursive cte as (
(select distinct on (col) col, col2, col3 from tbl7 where col=a.col and col2=a.col2)
union all
select * from (
with tmp(c) as (select col2 from cte limit 1)
select distinct on (col) col, col2, col3 from tbl7 where col=a.col and col2>(select c from tmp)
) t
) select * from cte
) b;
SELECT t2.* FROM
(SELECT DISTINCT ON (col) col FROM tbl4 t1) t1,
LATERAL(SELECT DISTINCT ON (col2) * FROM tbl4 t2 WHERE t2.col=t1.col ORDER BY col2, col3 DESC) t2;
This emulates DISTINCT on (col, col2) and uses skipscan for both columns
This emulates DISTINCT on (col, col2) and uses skipscan for both columns
Thank you!!
linkup=# \d tbl7
Table "public.tbl7"
Column | Type | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
col | bigint | | not null |
col2 | character(36) | | not null |
col3 | timestamp with time zone | | not null |
Indexes:
"tbl7_col_col2_col3_idx" btree (col, col2, col3 DESC)
Nested Loop (cost=1.12..1060514.95 rows=10000000 width=53)
-> Unique (cost=0.56..3.00 rows=5 width=8)
-> Custom Scan (SkipScan) on tbl7 t1 (cost=0.56..2.98 rows=5 width=8)
-> Index Only Scan using tbl7_col_col2_col3_idx on tbl7 t1 (cost=0.56..367214.09 rows=10000000 width=8)
Index Cond: (col > NULL::bigint)
-> Unique (cost=0.56..172102.38 rows=2000000 width=53)
-> Index Only Scan using tbl7_col_col2_col3_idx on tbl7 t2 (cost=0.56..167102.38 rows=2000000 width=53)
Index Cond: (col = t1.col)
@svenklemm Any idea why it doesn't use skipscan with these alternate data types? In this case col2 is not as repetitive as with the data loaded in the first case, although there is some repetition.
EDIT:
linkup=# explain SELECT DISTINCT ON (col2) * FROM tbl4 t2 WHERE t2.col='1' ORDER BY col2, col3 DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Unique (cost=0.43..2.44 rows=5 width=12)
-> Custom Scan (SkipScan) on tbl4 t2 (cost=0.43..2.43 rows=5 width=12)
-> Index Only Scan using tbl4_col_col2_col3_idx on tbl4 t2 (cost=0.43..101287.71 rows=1976867 width=12)
Index Cond: ((col = '1'::text) AND (col2 > NULL::text))
(4 rows)
linkup=# explain SELECT DISTINCT ON (col2) * FROM tbl7 t2 WHERE t2.col=1 ORDER BY col2, col3 DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Unique (cost=0.56..172478.21 rows=2012267 width=53)
-> Index Only Scan using tbl7_col_col2_col3_idx on tbl7 t2 (cost=0.56..167447.54 rows=2012267 width=53)
Index Cond: (col = 1)
(3 rows)
I notice the (col2 > NULL::text)
aspect of the first plan....
EDIT: I also noticed that col2 could not be type uuid
in previous testing... more went wrong sooner. We can make these fields type text
if absolutely necessary.
Hmm not sure but the datatypes are not the reason why it doesnt use skipscan. But in general you should always use text for any character columns.
https://wiki.postgresql.org/wiki/Don't_Do_This#Text_storage
dev=# \d+ tbl4
Table "public.tbl4"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+--------------------------+-----------+----------+---------+----------+--------------+-------------
col | bigint | | | | plain | |
col2 | character(36) | | | | extended | |
col3 | timestamp with time zone | | | | plain | |
Indexes:
"tbl4_col_col2_col3_idx" btree (col, col2, col3 DESC)
Access method: heap
dev=# explain SELECT * FROM (SELECT DISTINCT ON (col) col FROM tbl4 t1) t1, LATERAL(SELECT DISTINCT ON (col2) * FROM tbl4 t2 WHERE t2.col=t1.col) t2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.56..2.47 rows=4 width=61)
-> Unique (cost=0.28..0.74 rows=2 width=8)
-> Custom Scan (SkipScan) on tbl4 t1 (cost=0.28..0.74 rows=2 width=8)
-> Index Only Scan using tbl4_col_col2_col3_idx on tbl4 t1 (cost=0.28..435.72 rows=5004 width=8)
Index Cond: (col > NULL::bigint)
-> Unique (cost=0.28..0.81 rows=2 width=53)
-> Custom Scan (SkipScan) on tbl4 t2 (cost=0.28..0.81 rows=2 width=53)
-> Index Only Scan using tbl4_col_col2_col3_idx on tbl4 t2 (cost=0.28..301.04 rows=2502 width=53)
Index Cond: ((col = t1.col) AND (col2 > NULL::bpchar))
(9 rows)
linkup=# delete from tbl8
;
DELETE 10000000
linkup=# insert into tbl8
select trunc(random()*5), substring(uuid_generate_v4()::text from 1 for 3), to_timestamp(random()*31540000) from generate_series(1,10000000);
INSERT 0 10000000
linkup=# analyze tbl8;
ANALYZE
linkup=# explain select count(t2.*) FROM
(SELECT DISTINCT ON (col) col FROM tbl8 t1) t1,
LATERAL(SELECT DISTINCT ON (col2) * FROM tbl8 t2 WHERE t2.col=t1.col ORDER BY col2, col3 DESC) t2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=18177.56..18177.57 rows=1 width=8)
-> Nested Loop (cost=1.12..18126.36 rows=20480 width=77)
-> Unique (cost=0.56..3.43 rows=5 width=8)
-> Custom Scan (SkipScan) on tbl8 t1 (cost=0.56..3.41 rows=5 width=8)
-> Index Only Scan using tbl8_col_col2_col3_idx on tbl8 t1 (cost=0.56..1230713.24 rows=10018872 width=8)
Index Cond: (col > NULL::bigint)
-> Subquery Scan on t2 (cost=0.56..3583.62 rows=4096 width=77)
-> Unique (cost=0.56..3542.66 rows=4096 width=53)
-> Custom Scan (SkipScan) on tbl8 t2_1 (cost=0.56..3532.42 rows=4096 width=53)
-> Index Only Scan using tbl8_col_col2_col3_idx on tbl8 t2_1 (cost=0.56..605953.96 rows=2003774 width=53)
Index Cond: ((col = t1.col) AND (col2 > NULL::bpchar))
(11 rows)
linkup=# delete from tbl8
;
DELETE 10000000
linkup=# insert into tbl8
select trunc(random()*5), substring(uuid_generate_v4()::text from 1 for 36), to_timestamp(random()*31540000) from generate_series(1,10000000);
INSERT 0 10000000
linkup=# analyze tbl8;
ANALYZE
linkup=# explain select count(t2.*) FROM
(SELECT DISTINCT ON (col) col FROM tbl8 t1) t1,
LATERAL(SELECT DISTINCT ON (col2) * FROM tbl8 t2 WHERE t2.col=t1.col ORDER BY col2, col3 DESC) t2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3694802.14..3694802.15 rows=1 width=8)
-> Nested Loop (cost=1.12..3669602.14 rows=10080000 width=77)
-> Unique (cost=0.56..3.47 rows=5 width=8)
-> Custom Scan (SkipScan) on tbl8 t1 (cost=0.56..3.45 rows=5 width=8)
-> Index Only Scan using tbl8_col_col2_col3_idx on tbl8 t1 (cost=0.56..1320389.02 rows=10080002 width=8)
Index Cond: (col > NULL::bigint)
-> Subquery Scan on t2 (cost=0.56..713759.72 rows=2016000 width=77)
-> Unique (cost=0.56..693599.72 rows=2016000 width=53)
-> Index Only Scan using tbl8_col_col2_col3_idx on tbl8 t2_1 (cost=0.56..688559.72 rows=2016000 width=53)
Index Cond: (col = t1.col)
(10 rows)
@svenklemm The query plan appears to be affected by the number of distinct values in col2. Is there something like pg_hint_plan that can help? Postgres really needs a way to force a query plan... sometimes the user knows best.
Yes skip scan is affected by the number of distinct values. If the number of distinct values is close to the number of rows not using skipscan is more efficient.
In my sample of real data the lateral strategy you cooked up above (thanks again!) is selecting about 4M of 14M records... that 14M being a small subsample of billions. It was fine (used SkipScan) after I ran ANALYZE, but at some point the stats were messed up and I can tell you that with only 3 or 4 "col2" values per "col" (on average) it was a lot slower to avoid SkipScan. I still think this should be under user control when needed...
@svenklemm Just FYI, on 2.4.1 (Timescale Forge), the following instance of your method above seems to cause problems with EXPLAIN in the context of a distributed hypertable space-partitioned by the_id:
explain verbose select the_table.* from
(select distinct on (the_id) the_id from the_table where the_id in (1111,2222) order by the_id, receive_time desc offset 0) cid,
lateral (
select distinct on (id) * from the_table where the_table.the_id=cid.the_id order by id, receive_time desc offset 0
) the_table;
ERROR: [data-node-0000]: bind message supplies 0 parameters, but prepared statement "" requires 1
...the query itself runs fine. Should this be a separate bug report?
This was in the context of timescaledb.enable_remote_explain=on ... this seems to be critical when reproducing the issue.
I dont think this is related to SkipScan there is already a bugreport for this: https://github.com/timescale/timescaledb/issues/3128
I dont think this is related to SkipScan there is already a bugreport for this: #3128
Odd, the query above worked! Only the EXPLAIN failed, and even that worked again with enable_remote_explain=off
Initial implementation of skipscan only supports single column distinct but the implementation could be enhanced to support multiple columns as well.