pola-rs / tpch

MIT License
64 stars 35 forks source link

Use pyarrow to write parquet files #124

Closed jaychia closed 2 months ago

jaychia commented 2 months ago

Addresses: #123

On SCALE_FACTOR=10:

This does affect benchmark results quite a bit, depending on how resilient the Parquet reader implementations are to these poorly written Parquet files. However I think for the sake of having a benchmark that represents the expected/common case we should write the Parquet files properly!

ritchie46 commented 2 months ago

I think we should fix the culprit upstream instead of bandaid it here.

jaychia commented 2 months ago

That makes sense @ritchie46, I'll close this PR in favor of an upstream fix