Multi-column support for SkipScan

svenklemm commented 3 years ago

Initial implementation of skipscan only supports single column distinct but the implementation could be enhanced to support multiple columns as well.

jgm-ktg commented 3 years ago

-- index (a, b, time desc)
select distinct on (a, b) a, b, time from tbl where a in ('val1','val2') order by a, b, time desc;

The above is what we are talking about, right? Any estimate regarding when this feature will be added? Any workarounds in the meantime? Note that I cannot make a column combining a and b because I want to filter on a with the index.

jgm-ktg commented 3 years ago

                        Table "public.tbl4"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 col    | text                     |           | not null | 
 col2   | text                     |           | not null | 
 col3   | timestamp with time zone |           | not null | 
Indexes:
    "tbl4_col_col2_col3_idx" btree (col, col2, col3 DESC)

with recursive cte(c1,c2,c3) as (
    (select distinct on (col) col, col2, col3 from tbl4) -- parens required
    union all
    select * from (
        with tmp(c) as (select c2 from cte limit 1) -- can't use cte in subquery directly
        select distinct on (tbl4.col) tbl4.col, tbl4.col2, tbl4.col3 from tbl4
        where tbl4.col=any(array(select distinct on (col) col from tbl4))
        and tbl4.col2>(select c from tmp)
    ) t
)
select * from cte;

One try at mimic of this feature... depends on every value of col2 existing for each value of col

svenklemm commented 3 years ago

You can emulate this with lateral join like so:

SELECT * FROM   (SELECT DISTINCT ON (dev) dev FROM <table>) a,   LATERAL (SELECT DISTINCT ON (time) dev, time FROM <table> WHERE dev = a.dev) b;

jgm-ktg commented 3 years ago

You can emulate this with lateral join like so

@svenklemm Thanks for the tip. Can you put this in terms of col/col2/col3 above? I only see two columns referenced... time and dev

EDIT... it is almost as if I need the distinct on (col) to provide a list of starting records... if there are N of them, this would start N recursive CTEs to consume all of col2 for each col value? Can a recursive CTE be the rhs of a lateral?

EDIT: this has too many columns from a and doesn't quite seem to work out yet...

select * from
    (select distinct on (col) col, col2, col3 from tbl7) a,
    lateral (
        with recursive cte as (
           (select distinct on (col) col, col2, col3 from tbl7 where col=a.col and col2=a.col2)
           union all
           select * from (
             with tmp(c) as (select col2 from cte limit 1)
             select distinct on (col) col, col2, col3 from tbl7 where col=a.col and col2>(select c from tmp)
           ) t
        ) select * from cte
    ) b;

svenklemm commented 3 years ago

SELECT t2.* FROM 
  (SELECT DISTINCT ON (col) col FROM tbl4 t1) t1,
  LATERAL(SELECT DISTINCT ON (col2) * FROM tbl4 t2 WHERE t2.col=t1.col ORDER BY col2, col3 DESC) t2;

This emulates DISTINCT on (col, col2) and uses skipscan for both columns

jgm-ktg commented 3 years ago

This emulates DISTINCT on (col, col2) and uses skipscan for both columns

Thank you!!

jgm-ktg commented 3 years ago

linkup=# \d tbl7
                        Table "public.tbl7"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 col    | bigint                   |           | not null | 
 col2   | character(36)            |           | not null | 
 col3   | timestamp with time zone |           | not null | 
Indexes:
    "tbl7_col_col2_col3_idx" btree (col, col2, col3 DESC)

Nested Loop  (cost=1.12..1060514.95 rows=10000000 width=53)
   ->  Unique  (cost=0.56..3.00 rows=5 width=8)
         ->  Custom Scan (SkipScan) on tbl7 t1  (cost=0.56..2.98 rows=5 width=8)
               ->  Index Only Scan using tbl7_col_col2_col3_idx on tbl7 t1  (cost=0.56..367214.09 rows=10000000 width=8)
                     Index Cond: (col > NULL::bigint)
   ->  Unique  (cost=0.56..172102.38 rows=2000000 width=53)
         ->  Index Only Scan using tbl7_col_col2_col3_idx on tbl7 t2  (cost=0.56..167102.38 rows=2000000 width=53)
               Index Cond: (col = t1.col)

@svenklemm Any idea why it doesn't use skipscan with these alternate data types? In this case col2 is not as repetitive as with the data loaded in the first case, although there is some repetition.

EDIT:

linkup=# explain SELECT DISTINCT ON (col2) * FROM tbl4 t2 WHERE t2.col='1' ORDER BY col2, col3 DESC;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.43..2.44 rows=5 width=12)
   ->  Custom Scan (SkipScan) on tbl4 t2  (cost=0.43..2.43 rows=5 width=12)
         ->  Index Only Scan using tbl4_col_col2_col3_idx on tbl4 t2  (cost=0.43..101287.71 rows=1976867 width=12)
               Index Cond: ((col = '1'::text) AND (col2 > NULL::text))
(4 rows)

linkup=# explain SELECT DISTINCT ON (col2) * FROM tbl7 t2 WHERE t2.col=1 ORDER BY col2, col3 DESC;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.56..172478.21 rows=2012267 width=53)
   ->  Index Only Scan using tbl7_col_col2_col3_idx on tbl7 t2  (cost=0.56..167447.54 rows=2012267 width=53)
         Index Cond: (col = 1)
(3 rows)

I notice the (col2 > NULL::text) aspect of the first plan....

EDIT: I also noticed that col2 could not be type uuid in previous testing... more went wrong sooner. We can make these fields type text if absolutely necessary.

svenklemm commented 3 years ago

Hmm not sure but the datatypes are not the reason why it doesnt use skipscan. But in general you should always use text for any character columns.

https://wiki.postgresql.org/wiki/Don't_Do_This#Text_storage

dev=# \d+ tbl4
                                            Table "public.tbl4"
 Column |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
--------+--------------------------+-----------+----------+---------+----------+--------------+-------------
 col    | bigint                   |           |          |         | plain    |              |
 col2   | character(36)            |           |          |         | extended |              |
 col3   | timestamp with time zone |           |          |         | plain    |              |
Indexes:
    "tbl4_col_col2_col3_idx" btree (col, col2, col3 DESC)
Access method: heap

dev=# explain SELECT * FROM (SELECT DISTINCT ON (col) col FROM tbl4 t1) t1, LATERAL(SELECT DISTINCT ON (col2) * FROM tbl4 t2 WHERE t2.col=t1.col) t2;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.56..2.47 rows=4 width=61)
   ->  Unique  (cost=0.28..0.74 rows=2 width=8)
         ->  Custom Scan (SkipScan) on tbl4 t1  (cost=0.28..0.74 rows=2 width=8)
               ->  Index Only Scan using tbl4_col_col2_col3_idx on tbl4 t1  (cost=0.28..435.72 rows=5004 width=8)
                     Index Cond: (col > NULL::bigint)
   ->  Unique  (cost=0.28..0.81 rows=2 width=53)
         ->  Custom Scan (SkipScan) on tbl4 t2  (cost=0.28..0.81 rows=2 width=53)
               ->  Index Only Scan using tbl4_col_col2_col3_idx on tbl4 t2  (cost=0.28..301.04 rows=2502 width=53)
                     Index Cond: ((col = t1.col) AND (col2 > NULL::bpchar))
(9 rows)

jgm-ktg commented 3 years ago

linkup=# delete from tbl8
;
DELETE 10000000
linkup=# insert into tbl8
select trunc(random()*5), substring(uuid_generate_v4()::text from 1 for 3), to_timestamp(random()*31540000) from generate_series(1,10000000);
INSERT 0 10000000
linkup=# analyze tbl8;
ANALYZE
linkup=# explain select count(t2.*) FROM 
  (SELECT DISTINCT ON (col) col FROM tbl8 t1) t1,
  LATERAL(SELECT DISTINCT ON (col2) * FROM tbl8 t2 WHERE t2.col=t1.col ORDER BY col2, col3 DESC) t2;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=18177.56..18177.57 rows=1 width=8)
   ->  Nested Loop  (cost=1.12..18126.36 rows=20480 width=77)
         ->  Unique  (cost=0.56..3.43 rows=5 width=8)
               ->  Custom Scan (SkipScan) on tbl8 t1  (cost=0.56..3.41 rows=5 width=8)
                     ->  Index Only Scan using tbl8_col_col2_col3_idx on tbl8 t1  (cost=0.56..1230713.24 rows=10018872 width=8)
                           Index Cond: (col > NULL::bigint)
         ->  Subquery Scan on t2  (cost=0.56..3583.62 rows=4096 width=77)
               ->  Unique  (cost=0.56..3542.66 rows=4096 width=53)
                     ->  Custom Scan (SkipScan) on tbl8 t2_1  (cost=0.56..3532.42 rows=4096 width=53)
                           ->  Index Only Scan using tbl8_col_col2_col3_idx on tbl8 t2_1  (cost=0.56..605953.96 rows=2003774 width=53)
                                 Index Cond: ((col = t1.col) AND (col2 > NULL::bpchar))
(11 rows)

linkup=# delete from tbl8
;
DELETE 10000000
linkup=# insert into tbl8
select trunc(random()*5), substring(uuid_generate_v4()::text from 1 for 36), to_timestamp(random()*31540000) from generate_series(1,10000000);
INSERT 0 10000000
linkup=# analyze tbl8;
ANALYZE
linkup=# explain select count(t2.*) FROM 
  (SELECT DISTINCT ON (col) col FROM tbl8 t1) t1,
  LATERAL(SELECT DISTINCT ON (col2) * FROM tbl8 t2 WHERE t2.col=t1.col ORDER BY col2, col3 DESC) t2;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3694802.14..3694802.15 rows=1 width=8)
   ->  Nested Loop  (cost=1.12..3669602.14 rows=10080000 width=77)
         ->  Unique  (cost=0.56..3.47 rows=5 width=8)
               ->  Custom Scan (SkipScan) on tbl8 t1  (cost=0.56..3.45 rows=5 width=8)
                     ->  Index Only Scan using tbl8_col_col2_col3_idx on tbl8 t1  (cost=0.56..1320389.02 rows=10080002 width=8)
                           Index Cond: (col > NULL::bigint)
         ->  Subquery Scan on t2  (cost=0.56..713759.72 rows=2016000 width=77)
               ->  Unique  (cost=0.56..693599.72 rows=2016000 width=53)
                     ->  Index Only Scan using tbl8_col_col2_col3_idx on tbl8 t2_1  (cost=0.56..688559.72 rows=2016000 width=53)
                           Index Cond: (col = t1.col)
(10 rows)

@svenklemm The query plan appears to be affected by the number of distinct values in col2. Is there something like pg_hint_plan that can help? Postgres really needs a way to force a query plan... sometimes the user knows best.

svenklemm commented 3 years ago

Yes skip scan is affected by the number of distinct values. If the number of distinct values is close to the number of rows not using skipscan is more efficient.

jgm-ktg commented 3 years ago

In my sample of real data the lateral strategy you cooked up above (thanks again!) is selecting about 4M of 14M records... that 14M being a small subsample of billions. It was fine (used SkipScan) after I ran ANALYZE, but at some point the stats were messed up and I can tell you that with only 3 or 4 "col2" values per "col" (on average) it was a lot slower to avoid SkipScan. I still think this should be under user control when needed...

jgm-ktg commented 3 years ago

@svenklemm Just FYI, on 2.4.1 (Timescale Forge), the following instance of your method above seems to cause problems with EXPLAIN in the context of a distributed hypertable space-partitioned by the_id:

explain verbose select the_table.* from
(select distinct on (the_id) the_id from the_table where the_id in (1111,2222) order by the_id, receive_time desc offset 0) cid,
lateral (

        select distinct on (id) * from the_table where the_table.the_id=cid.the_id order by id, receive_time desc offset 0

) the_table;

ERROR:  [data-node-0000]: bind message supplies 0 parameters, but prepared statement "" requires 1

...the query itself runs fine. Should this be a separate bug report?

This was in the context of timescaledb.enable_remote_explain=on ... this seems to be critical when reproducing the issue.

svenklemm commented 3 years ago

I dont think this is related to SkipScan there is already a bugreport for this: https://github.com/timescale/timescaledb/issues/3128

jgm-ktg commented 3 years ago

I dont think this is related to SkipScan there is already a bugreport for this: #3128

Odd, the query above worked! Only the EXPLAIN failed, and even that worked again with enable_remote_explain=off

timescale / timescaledb

Multi-column support for SkipScan #3094