Open greenlion opened 3 years ago
FIXED Note: Most recent check in adds support for AVG so that this query works with parallel query!
where l_shipdate <= '1998-08-26' - interval 2 day
-> Sort: lineitem.l_returnflag, lineitem.l_linestatus (actual time=13486.608..13486.609 rows=4 loops=1)
-> Table scan on |
---|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (13.50 sec)
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date_sub('1998-12-01', interval 97 day)
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Problem: ECP isn't filtering rows
-> Sort: lineitem.l_returnflag, lineitem.l_linestatus (actual time=15966.831..15966.832 rows=4 loops=1)
-> Table scan on
Optimized version:
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= '1998-08-26'
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
EXPLAIN: -> Sort: lineitem.l_returnflag, lineitem.l_linestatus (actual time=15451.847..15451.848 rows=4 loops=1)
-> Table scan on
Note that the filtered number of rows is lower on the second optimized query. ECP isn't working with date_sub. Note cached evaluation in the first explain that is pushed down (not noted) in the second explain.
Also note that it is almost a FTS and so the index doesn't make any difference but it will likely in other DBT-3 queries, unless they all examine the whole lineitem table!
Turning off row visibility and transaction visibility checks shaves one second off the query. I don't think that is worth optimizing at this time. Turning off row visibility decreases count(*) query time by 2 tenths of a second (.92 with .76 without)
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 14
and p_type like '%TIN'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
LIMIT 100;```
-> Limit: 100 row(s) (actual time=46414.522..46414.544 rows=100 loops=1)
-> Sort: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey, limit input to 100 row(s) per chunk (actual time=46414.521..46414.540 rows=100 loops=1)
-> Stream results (cost=100372.45 rows=617) (actual time=161.047..46413.755 rows=155 loops=1)
-> Inner hash join (part.p_partkey = partsupp.ps_partkey), (partsupp.ps_supplycost = (select #2)) (cost=100372.45 rows=617) (actual time=161.043..46413.163 rows=155 loops=1)
-> Filter: ((part.p_size = 14) and (part.p_type like '%TIN')) (cost=39.64 rows=222) (actual time=0.040..19.192 rows=806 loops=1)
-> Table scan on part (cost=39.64 rows=200000) (actual time=0.017..13.420 rows=4153 loops=1)
-> Hash
-> Inner hash join (supplier.s_nationkey = nation.n_nationkey), (supplier.s_suppkey = partsupp.ps_suppkey) (cost=25026.69 rows=2500) (actual time=78.946..105.967 rows=3432 loops=1)
-> Table scan on supplier (cost=2.00 rows=10000) (actual time=4.565..29.905 rows=10000 loops=1)
-> Hash
-> Inner hash join (nation.n_regionkey = region.r_regionkey) (cost=26.69 rows=25) (actual time=53.164..58.589 rows=83060 loops=1)
-> Table scan on nation (cost=0.13 rows=25) (actual time=0.392..0.473 rows=25 loops=1)
-> Hash
-> Inner hash join (no condition) (cost=1.68 rows=10) (actual time=41.800..51.059 rows=16612 loops=1)
-> Table scan on partsupp (cost=0.33 rows=2) (actual time=41.117..48.776 rows=16612 loops=1)
-> Hash
-> Table scan on region (cost=0.55 rows=5) (actual time=0.651..0.665 rows=1 loops=1)
Runs in 44 seconds if I turn on statistics munging and make partsupp have a row count of 2.
Query does not complete without this optimization.
Query requires bitmap index join optimization.
Start working on the DBT-3 at scale factor 1. Problems with each query will be added to this list as comments.
This is a meta bug of sorts.