prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.04k stars 5.37k forks source link

[native] Fix TPC-DS q90 failure in native engine #20054

Open aditi-pandit opened 1 year ago

aditi-pandit commented 1 year ago

TPC-DS q90 is failing in native tests with "Division by 0" error.

https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeTpcdsQueries.java#L1093

Please fix.

@yzhang1991

karteekmurthys commented 1 year ago

I am not able to reproduce the issue locally on M1 mac:

Screenshot 2023-07-07 at 11 34 28 AM
aditi-pandit commented 1 year ago

@karteekmurthys : The query is failing consistently in the build.

Please test this with a linux-build and investigate.

karteekmurthys commented 1 year ago

I was able to reproduce the issue:

presto:tpcds> SELECT (CAST("amc" AS DECIMAL(15,4)) / CAST("pmc" AS DECIMAL(15,4))) "am_pm_ratio" FROM ( SELECT "count"(*) "amc" FROM tpcds.web_sales , tpcds.household_demographics , tpcds.time_dim , tpcds.web_page WHERE ("ws_sold_time_sk" = "time_dim"."t_time_sk") AND ("ws_ship_hdemo_sk" = "household_demographics"."hd_demo_sk") AND ("ws_web_page_sk" = "web_page"."wp_web_page_sk") AND ("time_dim"."t_hour" BETWEEN 8 AND (8 + 1)) AND ("household_demographics"."hd_dep_count" = 6) AND ("web_page"."wp_char_count" BETWEEN 5000 AND 5200) ) "at" , ( SELECT "count"(*) "pmc" FROM tpcds.web_sales , tpcds.household_demographics , tpcds.time_dim , tpcds.web_page WHERE ("ws_sold_time_sk" = "time_dim"."t_time_sk") AND ("ws_ship_hdemo_sk" = "household_demographics"."hd_demo_sk") AND ("ws_web_page_sk" = "web_page"."wp_web_page_sk") AND ("time_dim"."t_hour" BETWEEN 19 AND (19 + 1)) AND ("household_demographics"."hd_dep_count" = 6) AND ("web_page"."wp_char_count" BETWEEN 5000 AND 5200) ) pt ORDER BY "am_pm_ratio" ASC LIMIT 100;

Query 20230710_201922_00011_stmqd, FAILED, 1 node
Splits: 40 total, 10 done (25.00%)
[Latency: client-side: 0:02, server-side: 0:02] [0 rows, 6.9KB] [0 rows/s, 4.05KB/s]

Query 20230710_201922_00011_stmqd failed: b != 0 (0 vs. 0) Division by zero presto.default.divide(cast((count) as DECIMAL(15,4)), cast((count_87) as DECIMAL(15,4)))
karteekmurthys commented 1 year ago

The data returned has zeros in it:

presto:tpcds> SELECT CAST("amc" AS DECIMAL(15,4)),CAST("pmc" AS DECIMAL(15,4)) FROM ( SELECT "count"(*) "amc" FROM tpcds.web_sales , tpcds.household_demographics , tpcds.time_dim , tpcds.web_page WHERE ("ws_sold_time_sk" = "time_dim"."t_time_sk") AND ("ws_ship_hdemo_sk" = "household_demographics"."hd_demo_sk") AND ("ws_web_page_sk" = "web_page"."wp_web_page_sk") AND ("time_dim"."t_hour" BETWEEN 8 AND (8 + 1)) AND ("household_demographics"."hd_dep_count" = 6) AND ("web_page"."wp_char_count" BETWEEN 5000 AND 5200) ) "at" , ( SELECT "count"(*) "pmc" FROM tpcds.web_sales , tpcds.household_demographics , tpcds.time_dim , tpcds.web_page WHERE ("ws_sold_time_sk" = "time_dim"."t_time_sk") AND ("ws_ship_hdemo_sk" = "household_demographics"."hd_demo_sk") AND ("ws_web_page_sk" = "web_page"."wp_web_page_sk") AND ("time_dim"."t_hour" BETWEEN 19 AND (19 + 1)) AND ("household_demographics"."hd_dep_count" = 6) AND ("web_page"."wp_char_count" BETWEEN 5000 AND 5200) ) pt;
 _col0  | _col1  
--------+--------
 0.0000 | 0.0000 
(1 row)

Query 20230710_203456_00015_stmqd, FINISHED, 1 node
Splits: 40 total, 10 done (25.00%)
[Latency: client-side: 0:02, server-side: 0:02] [0 rows, 6.9KB] [0 rows/s, 3.81KB/s]

Below is the result run only on the Java coordinator:

presto:tpcds> SELECT CAST("amc" AS DECIMAL(15,4)),CAST("pmc" AS DECIMAL(15,4)), (CAST("amc" AS DECIMAL(15,4)) / CAST("pmc" AS DECIMAL(15,4))) "am_pm_ratio" FROM ( SELECT "count"(*) "amc" FROM tpcds.web_sales , tpcds.household_demographics , tpcds.time_dim , tpcds.web_page WHERE ("ws_sold_time_sk" = "time_dim"."t_time_sk") AND ("ws_ship_hdemo_sk" = "household_demographics"."hd_demo_sk") AND ("ws_web_page_sk" = "web_page"."wp_web_page_sk") AND ("time_dim"."t_hour" BETWEEN 8 AND (8 + 1)) AND ("household_demographics"."hd_dep_count" = 6) AND ("web_page"."wp_char_count" BETWEEN 5000 AND 5200) ) "at" , ( SELECT "count"(*) "pmc" FROM tpcds.web_sales , tpcds.household_demographics , tpcds.time_dim , tpcds.web_page WHERE ("ws_sold_time_sk" = "time_dim"."t_time_sk") AND ("ws_ship_hdemo_sk" = "household_demographics"."hd_demo_sk") AND ("ws_web_page_sk" = "web_page"."wp_web_page_sk") AND ("time_dim"."t_hour" BETWEEN 19 AND (19 + 1)) AND ("household_demographics"."hd_dep_count" = 6) AND ("web_page"."wp_char_count" BETWEEN 5000 AND 5200) ) pt ORDER BY "am_pm_ratio" ASC LIMIT 100;

Query 20230710_215631_00003_pmv5t, FAILED, 1 node
Splits: 275 total, 224 done (81.45%)
[Latency: client-side: 0:06, server-side: 0:06] [110K rows, 636KB] [19K rows/s, 110KB/s]

Query 20230710_215631_00003_pmv5t failed: Division by zero
aditi-pandit commented 1 year ago

@karteekmurthys : Sounds good. Maybe add a comment for the test indicating the data has 0s and fails on both Java and C++ with the exception. So the behavior is expected.

Though it seems a bit strange that the TPC-DS benchmark should have such a test.