prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.96k stars 5.35k forks source link

[native] Add TPC-DS connector #23067

Open pdabre12 opened 3 months ago

pdabre12 commented 3 months ago

Description

Adds the native TPC-DS connector and the codegen (dsdgen) files. The TPC-DS connector will be a wrapper for the data generator (dsdgen) provided by the TPC organization, originally implemented in C. This implementation must mimic the behavior of the original C implementation. DuckDB has a TPC-DS connector that uses C++ files to wrap the C implementation, and we will use these C++ files for our implementation.

Motivation and Context

The goal is to add a TPC-DS connector to enhance our ability to write end-to-end tests for Prestissimo and conduct micro-benchmarks in Velox. This addition will significantly improve our testing capabilities by ensuring comprehensive coverage and performance validation.

Impact

With this change, we can now write e2e tests and microbenchmarks.

Test Plan

Native end-to-end tests. Future enhancements will include adding SpeedTest and ConnectorTest to the Velox repository.

Contributor checklist

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

TPC-DS Connector Changes
* Add TPC-DS connector for Presto C++. :pr:`23067`

Resolves: https://github.com/prestodb/presto/issues/22361

linux-foundation-easycla[bot] commented 3 months ago

CLA Signed

The committers listed above are authorized under a signed CLA.

steveburnett commented 2 months ago

Consider updating the Use Cases in the Presto C++ section of the Presto doc to add TPC-DS to the supported connectors.

steveburnett commented 1 month ago

Consider a release note entry similar to the following:

== RELEASE NOTES ==

TPC-DS Connector Changes
* Add TPC-DS connector for Presto C++. :pr:`23067`
pdabre12 commented 1 month ago

@aditi-pandit @yingsu00 This PR is now ready for review. Please take a look. Thanks.

pdabre12 commented 1 month ago

FYI: @czentgr

pdabre12 commented 1 month ago

@aditi-pandit @czentgr. Thanks for the review. Stalling this PR until we figure out the licensing.

pdabre12 commented 1 month ago

@czentgr @aditi-pandit The licensing issue is resolved. Addressed your comments , can you have another look? Thank you.

Will address the failing header checks soon.

aditi-pandit commented 2 weeks ago

@pdabre12 Some comments. ~I think we can merge the external portion first in a separate commit as they are copied externally. That makes it easy to review the new bits.~ I see that it is already a separate commit.

@aditi-pandit, @tdcmeehan I think we should have a RFC for this work. What do you think? The RFC can be simple.

@pdabre12 @majetideepak : Small RFC is okay. Let's bring this to Native worker working group next session so that it gets immediate attention from others.