prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.08k stars 5.39k forks source link

Update the TPCDS connector to be compliant with the latest spec and dsdgen #22932

Open yingsu00 opened 5 months ago

yingsu00 commented 5 months ago

The TPCDS specification and dsdgen have been updated several times in the last a few years, but Presto TPCDS connector hasn't been updated for a long time. There are several changes we noticed recently:

  1. The row count error margin was removed. It used to allow 0.01% difference in the row counts in spec v2.1.0, but the margin was removed in 2019. Now all row counts should exactly match the latest dsdgen.
  2. The old dsdgen generates different column order than the spec for the modification data sets. This bug should have been fixed in 2021.

Expected Behavior or Use Case

The Presto TPCDS connector generates exactly the same data as in the latest spec and dsdgen

Presto Component, Service, or Connector

The Presto Java based TPCDS connector

Possible Implementation

Example Screenshots (if appropriate):

Screenshot 2024-06-05 at 17 34 21 Screenshot 2024-06-05 at 17 48 49

Context

yingsu00 commented 5 months ago

cc @yzhang1991 @aditi-pandit @pratyakshsharma @nmahadevuni