lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 227 forks source link

perf: Benchmark TPCH #1756

Open eddyxu opened 11 months ago

eddyxu commented 11 months ago

Benchmark TPCH performance against parquet / orc.

westonpace commented 11 months ago

TPCH, assuming a system is solid, is primarily a benchmark of the hash-join implementation. Its designed to be more compute intensive and less scan intensive. So I don't know that it is all that interesting a measurement for file formats.

osawyerr commented 11 months ago

TPCH, assuming a system is solid, is primarily a benchmark of the hash-join implementation. Its designed to be more compute intensive and less scan intensive. So I don't know that it is all that interesting a measurement for file formats.

Some of the TPCH queries have no joins. Queries 1 and Queries 6 in particular. These have been historically slow when comparing Parquet vs Lance. Parquet outperformed Lance by orders of magnitude.