ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.77k stars 520 forks source link

[YQL-18432] Сross join is not transformed into inner join #4975

Open aakulaga-ydb opened 3 months ago

aakulaga-ydb commented 3 months ago

`-- START PARAMETERS use ada; pragma TablePathPrefix = "home/tpcds/3Tb"; pragma CompactGroupBy;

pragma config.flags("OptimizerFlags", "FieldSubsetEnableMultiusage");

pragma yt.PartitionByConstantKeysViaMap;

pragma yt.MaxReplicationFactorToFuseOperations="100";

pragma yt.OperationSpec='{time_limit=3600000; locality_timeout=0; enable_partitioned_data_balancing=%false}';

pragma yt.TableContentLocalExecution;

-- END PARAMETERS

-- START COMMON HEADER pragma TablePathPrefix = "home/tpcds/3Tb"; pragma yt.TemporaryPrimaryMedium = "ssd_blobs_tpcds_bench"; pragma yt.PublishedPrimaryMedium = "ssd_blobs_tpcds_bench"; pragma yt.IntermediateDataMedium = "ssd_blobs_tpcds_bench"; pragma yt.Pool = "tpcds"; pragma yt.SchedulingTagFilter = "%true"; pragma yt.QueryCacheMode = "disable"; --pragma dq.WorkerFilter='ClusterName: "ada"'; --pragma DqEngine="auto"; pragma yt.HybridDqExecution = "true"; pragma yt.AutoMerge = "disabled"; pragma yt.DataSizePerJob = "16M"; pragma yt.DataSizePerMapJob = "24M"; --pragma yt.DataSizePerSortJob = "1G"; pragma yt.DataSizePerPartition = "64M"; pragma AnsiOptionalAs; pragma AnsiInForEmptyOrNullableItemsCollections; pragma yt.MapJoinLimit = "4G"; -- END COMMON HEADER

select item.i_item_id ,item.i_item_desc ,store.s_store_id ,store.s_store_name ,sum(ss_net_profit) as store_sales_profit ,sum(sr_net_loss) as store_returns_loss ,sum(cs_net_profit) as catalog_sales_profit from store_sales cross join date_dim d1 cross join store_returns cross join date_dim d2 cross join catalog_sales cross join date_dim d3 cross join store cross join item where d1.d_date_sk = ss_sold_date_sk and sr_returned_date_sk = d2.d_date_sk and cs_sold_date_sk = d3.d_date_sk and d1.d_moy = 4 and d1.d_year = 2000

and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and d2.d_moy between 4 and 10 and d2.d_year = 2000 and sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and d3.d_moy between 4 and 10 and d3.d_year = 2000 group by item.i_item_id ,item.i_item_desc ,store.s_store_id ,store.s_store_name order by item.i_item_id ,item.i_item_desc ,store.s_store_id ,store.s_store_name limit 100; -- end query 1 in stream 0 using template query24.tpl `

По предварительным данным аффектит запросы q8, q25, q29

qrort commented 3 months ago

This Cross join does not decay into Inner: Image

The reason is RotateCrossJoin only supports the optimization if a label is present exactly as CrossJoin child. If join depth exceeds 1, no optimization takes place.

So we need to support nested join in RotateCrossJoin.

qrort commented 3 months ago
 from
 bindings.store_sales as store_sales
 cross join bindings.store_returns as store_returns
 cross join bindings.catalog_sales as catalog_sales
 cross join bindings.date_dim d1
 cross join bindings.date_dim d2
 cross join bindings.date_dim d3
 cross join bindings.store as store
 cross join bindings.item as item

this join order allows every Cross join to be rewritten as Inner.

qrort commented 2 months ago

will be done in https://github.com/ydb-platform/ydb/issues/5404