Open borderlayout opened 2 years ago
---------- the prestodb------
the summary:
the livePlan:
part of the plan:
the join logic:
As the picture shows the dynamic filter is working, but the loaded data is not reduced.
--------trino--------- I also test the trino 384, the summary:
the live plan
part of plan:
the join logic:
The trino loads 1 percent of data size. For example, the input data of store_sales is 57.6M in prestodb, but the input data of the same table is 595K in trino . The total input data is 11.6M in trino, and it is smaller than 125M in prestodb.
presto: 05_presto_dynamic_explain.txt
trino: 05_trino_dynamic_explain.txt
FYI @ClarenceThreepwood @fgwang7w
This will be fixed/improved with #15763
Thank you @ClarenceThreepwood
My env: data:tpc-ds 20G on hive. all hive tables have statistics info(presto had already run analyze table). presto version: 0.273.1 presto config: experimental.enable-dynamic-filtering=true , optimizer.join-reordering-strategy=AUTOMATIC join-distribution-type=AUTOMATIC
I run the tpc-ds in my test env and find the dynamic filter can not reduce the amount of scanned data of the fact table in some cases:
tpcds 05.sql is: WITH ssr AS ( SELECT "s_store_id" , "sum"("sales_price") "sales" , "sum"("profit") "profit" , "sum"("return_amt") "returns" , "sum"("net_loss") "profit_loss" FROM ( SELECT "ss_store_sk" "store_sk" , "ss_sold_date_sk" "date_sk" , "ss_ext_sales_price" "sales_price" , "ss_net_profit" "profit" , CAST(0 AS DECIMAL(7,2)) "return_amt" , CAST(0 AS DECIMAL(7,2)) "net_loss" FROM store_sales UNION ALL SELECT "sr_store_sk" "store_sk" , "sr_returned_date_sk" "date_sk" , CAST(0 AS DECIMAL(7,2)) "sales_price" , CAST(0 AS DECIMAL(7,2)) "profit" , "sr_return_amt" "return_amt" , "sr_net_loss" "net_loss" FROM store_returns ) salesreturns , date_dim , store WHERE ("date_sk" = "d_date_sk") AND ("d_date" BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '14' DAY)) AND ("store_sk" = "s_store_sk") GROUP BY "s_store_id" ) , csr AS ( SELECT "cp_catalog_page_id" , "sum"("sales_price") "sales" , "sum"("profit") "profit" , "sum"("return_amt") "returns" , "sum"("net_loss") "profit_loss" FROM ( SELECT "cs_catalog_page_sk" "page_sk" , "cs_sold_date_sk" "date_sk" , "cs_ext_sales_price" "sales_price" , "cs_net_profit" "profit" , CAST(0 AS DECIMAL(7,2)) "return_amt" , CAST(0 AS DECIMAL(7,2)) "net_loss" FROM catalog_sales UNION ALL SELECT "cr_catalog_page_sk" "page_sk" , "cr_returned_date_sk" "date_sk" , CAST(0 AS DECIMAL(7,2)) "sales_price" , CAST(0 AS DECIMAL(7,2)) "profit" , "cr_return_amount" "return_amt" , "cr_net_loss" "net_loss" FROM catalog_returns ) salesreturns , date_dim , catalog_page WHERE ("date_sk" = "d_date_sk") AND ("d_date" BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '14' DAY)) AND ("page_sk" = "cp_catalog_page_sk") GROUP BY "cp_catalog_page_id" ) , wsr AS ( SELECT "web_site_id" , "sum"("sales_price") "sales" , "sum"("profit") "profit" , "sum"("return_amt") "returns" , "sum"("net_loss") "profit_loss" FROM ( SELECT "ws_web_site_sk" "wsr_web_site_sk" , "ws_sold_date_sk" "date_sk" , "ws_ext_sales_price" "sales_price" , "ws_net_profit" "profit" , CAST(0 AS DECIMAL(7,2)) "return_amt" , CAST(0 AS DECIMAL(7,2)) "net_loss" FROM web_sales UNION ALL SELECT "ws_web_site_sk" "wsr_web_site_sk" , "wr_returned_date_sk" "date_sk" , CAST(0 AS DECIMAL(7,2)) "sales_price" , CAST(0 AS DECIMAL(7,2)) "profit" , "wr_return_amt" "return_amt" , "wr_net_loss" "net_loss" FROM (web_returns LEFT JOIN web_sales ON ("wr_item_sk" = "ws_item_sk") AND ("wr_order_number" = "ws_order_number")) ) salesreturns , date_dim , web_site WHERE ("date_sk" = "d_date_sk") AND ("d_date" BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '14' DAY)) AND ("wsr_web_site_sk" = "web_site_sk") GROUP BY "web_site_id" ) SELECT "channel" , "id" , "sum"("sales") "sales" , "sum"("returns") "returns" , "sum"("profit") "profit" FROM ( SELECT 'store channel' "channel" , "concat"('store', "s_store_id") "id" , "sales" , "returns" , ("profit" - "profit_loss") "profit" FROM ssr UNION ALL SELECT 'catalog channel' "channel" , "concat"('catalog_page', "cp_catalog_page_id") "id" , "sales" , "returns" , ("profit" - "profit_loss") "profit" FROM csr UNION ALL SELECT 'web channel' "channel" , "concat"('web_site', "web_site_id") "id" , "sales" , "returns" , ("profit" - "profit_loss") "profit" FROM wsr ) x GROUP BY ROLLUP (channel, id) ORDER BY "channel" ASC, "id" ASC LIMIT 100