voltrondata-labs / arrowbench

R package for benchmarking
Other
13 stars 9 forks source link

TPC-H Query 14, has a slight typo in the dbplyr implementation #57

Open jonkeane opened 3 years ago

jonkeane commented 3 years ago

Arrow and dplyr are the same, but arrow and duckdb have slight differences in promo_revenue:

# query 14 scale factor 0.01
Loading required package: DBI
[1] 0
==================================================
  Query: 14
==================================================
  ✔ No differences
arrow vs duckdb
promo_revenue
- arrow[1, ]       16.28156
+ duckdb[1, ]      15.48655

`arrow$promo_revenue`: 16.3
`duckdb$promo_revenue`: 15.5

# query 14 scale factor 0.1
jkeane@het arrowbench % ARROWBENCH_LOCAL_DIR="~/repos/ab_store/het" Rscript inst/tpch-answer-gen.R
Loading required package: DBI
[1] 0
==================================================
  Query: 14
==================================================
  ✔ No differences
arrow vs duckdb
promo_revenue
- arrow[1, ]       16.67804
+ duckdb[1, ]      16.28386

`arrow$promo_revenue`: 16.7
`duckdb$promo_revenue`: 16.3
jonkeane commented 2 years ago

And these are probably related to the fact that the query is slightly off:

https://github.com/ursacomputing/arrowbench/blob/e8bcce4ed54168623ad0a4205da1187d61f1a9dd/R/tpch-queries.R#L499

This date should be 1995-09-01