Open borderlayout opened 2 years ago
I also run this test on trino 383, the same sql and data. The live plan is
presto analyze: presto.txt
trino analzye: trino.txt
https://github.com/prestodb/presto/pull/17849 there is a pull request for this,but after merged this pr,hive connector still has this issue。 trino384_hive_plan.txt presto273_hive_plan.txt
CC @rschlussel @fgwang7w
@borderlayout Can you retest ? I think this has been fixed for a while. E.g see Q17 plan for the (upartitioned) TPCDS schema - https://github.com/prestodb/presto/blob/f12e3d1cd05852fdb723c2a9f0537d013cf1bcbf/presto-benchto-benchmarks/src/test/resources/sql/presto/tpcds/q17.plan.txt#L12-L24
I run some tpc-ds sqls and find the reorderJoins rule does not supply the best join order. env: presto version 0.273, it has default configuraion and config.properies adds these :
JOIN opt
optimizer.join-reordering-strategy=AUTOMATIC join-distribution-type=AUTOMATIC Test tables are on Hive (tpcds17 and tpcds25). Before run tests, all tables have statistical information(for CBO).
For example, tpcds 17: SELECT i_item_id , i_item_desc , s_state , count(ss_quantity) store_sales_quantitycount , avg(ss_quantity) store_sales_quantityave , stddev_samp(ss_quantity) store_sales_quantitystdev , (stddev_samp(ss_quantity) / avg(ss_quantity)) store_sales_quantitycov , count(sr_return_quantity) store_returns_quantitycount , avg(sr_return_quantity) store_returns_quantityave , stddev_samp(sr_return_quantity) store_returns_quantitystdev , (stddev_samp(sr_return_quantity) / avg(sr_return_quantity)) store_returns_quantitycov , count(cs_quantity) catalog_sales_quantitycount , avg(cs_quantity) catalog_sales_quantityave , stddev_samp(cs_quantity) catalog_sales_quantitystdev , (stddev_samp(cs_quantity) / avg(cs_quantity)) catalog_sales_quantitycov FROM store_sales , store_returns , catalog_sales , date_dim d1 , date_dim d2 , date_dim d3 , store , item WHERE (d1.d_quarter_name = '2001Q1') AND (d1.d_date_sk = ss_sold_date_sk) AND (i_item_sk = ss_item_sk) AND (s_store_sk = ss_store_sk) AND (ss_customer_sk = sr_customer_sk) AND (ss_item_sk = sr_item_sk) AND (ss_ticket_number = sr_ticket_number) AND (sr_returned_date_sk = d2.d_date_sk) AND (d2.d_quarter_name IN ('2001Q1', '2001Q2', '2001Q3')) AND (sr_customer_sk = cs_bill_customer_sk) AND (sr_item_sk = cs_item_sk) AND (cs_sold_date_sk = d3.d_date_sk) AND (d3.d_quarter_name IN ('2001Q1', '2001Q2', '2001Q3')) GROUP BY i_item_id, i_item_desc, s_state ORDER BY i_item_id ASC, i_item_desc ASC, s_state ASC LIMIT 100;
The store_sales, store_returns and catalog_sales are all fact tables, As we known, the row size of fact table is larger than dimension table. But presto runs the sql as follow: the Live Plan:
The store_sales join store_returns firstly.
So it causes the input rows and input data are very large.
Is it because my Presto is lack of configuration or is it a bug?
Thank you.