spiraldb / vortex

An extensible, state-of-the-art columnar file format
https://vortex.dev
Apache License 2.0
1.01k stars 28 forks source link

[DNM] benchmarks against object storage #1472

Open a10y opened 6 days ago

a10y commented 6 days ago

I'm using this PR as a space to collect some info about running the TPC-H queries against object storage. Goals are to compare

Against storage backends

Changes

This PR creates a new binary that runs every TPC-H query while logging IOs in our objectstore reader, allowing us to examine both request sizes and request counts for each query.

Parquet and Vortex are each selectable, and the bucket is also configurable.

To run the test that uses S3 Express One Zone, you need to set AWS_S3_EXPRESS=true in your .env or directly in your shell environment

a10y commented 6 days ago

Initial results

Attaching two zips, one with TRACE-level logs of executing all TPC-H queries (except q15) using the Vortex Datafusion provider.

s3express_vortex.zip s3_vortex.zip

Some interesting bits:

Total number of IO's to perform each query:

image

Total time to execute the query (not including table registration)

S3 normal:

image

S3 Express One:

image