voltrondata-labs / arrowbench

R package for benchmarking
Other
13 stars 9 forks source link

TPC-H Query 20, scale factor 10 wonkiness #56

Open jonkeane opened 2 years ago

jonkeane commented 2 years ago

arrow and dplyr agree on the answer, but duckdb is missing a single supplier Supplier#000048933, which looks like it meets the criteria, but running the query in DuckDB does not return it.

# This looks eligible for partkey 1998894, supplier key 48933
sub_sql <- "
SELECT
                    *
                FROM
                    lineitem
                INNER JOIN
                    part
                    ON l_partkey = p_partkey
                WHERE
                    p_name LIKE 'forest%'
                    AND l_suppkey = 48933
                    AND l_shipdate >= CAST('1994-01-01' AS date)
                    AND l_shipdate < CAST('1995-01-01' AS date)
 "
result_duckdb <- as_tibble(dbGetQuery(con, sub_sql))
> result_duckdb
A tibble: 2 × 25
l_orderkey l_partkey l_suppkey l_linenumber l_quantity l_extendedprice
<int>     <int>     <int>        <int>      <int>           <dbl>
 1   48224898   1998894     48933            2         35           69748
2   14710885   1998894     48933            4         45           89676
# … with 19 more variables: l_discount <dbl>, l_tax <dbl>, l_returnflag <chr>,
#   l_linestatus <chr>, l_shipdate <date>, l_commitdate <date>,
#   l_receiptdate <date>, l_shipinstruct <chr>, l_shipmode <chr>,
#   l_comment <chr>, p_partkey <int>, p_name <chr>, p_mfgr <chr>,
#   p_brand <chr>, p_type <chr>, p_size <int>, p_container <chr>,
#   p_retailprice <dbl>, p_comment <chr>