sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Improve performance of Row API by storing TypePtr instead of String #204

Closed sadikovi closed 5 years ago

sadikovi commented 5 years ago

This PR updates Row struct to store Vec<(TypePtr, Field)> instead of Vec<(String, Field)>. This improves time when reading Parquet files.

For this I had to change record reader to pass type pointer instead of field name. This unfortunately makes it difficult to test, so I added a macro to generate dummy type with a field name.

Benchmark results:

Before

test record_reader_10k_collect             ... bench:  41,689,686 ns/iter (+/- 33,883,503) = 16 MB/s
test record_reader_stock_simulated_collect ... bench: 335,444,189 ns/iter (+/- 36,688,424) = 3 MB/s
test record_reader_stock_simulated_column  ... bench:  14,664,590 ns/iter (+/- 1,444,286) = 87 MB/s

After

test record_reader_10k_collect             ... bench:  26,658,590 ns/iter (+/- 1,773,580) = 25 MB/s
test record_reader_stock_simulated_collect ... bench: 140,238,171 ns/iter (+/- 6,796,855) = 9 MB/s
test record_reader_stock_simulated_column  ... bench:  15,275,906 ns/iter (+/- 349,117) = 84 MB/s
sadikovi commented 5 years ago

Fails to run $ rustup component add rustfmt-preview, I tried triggering build over the weekend and today - still the same error.

sunchao commented 5 years ago

Yes, there are some issue in the rustup toolchain right now. See the reddit thread. Let me see if we can use a nightly on a particular date to fix this issue.

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 706


Files with Coverage Reduction New Missed Lines %
record/api.rs 27 96.4%
record/reader.rs 72 87.4%
<!-- Total: 99 -->
Totals Coverage Status
Change from base Build 705: -0.007%
Covered Lines: 13175
Relevant Lines: 13756

💛 - Coveralls
sunchao commented 5 years ago

Sorry @sadikovi I forgot about this PR. Since parquet-rs has merged into Apache Arrow, could you file a JIRA and submit a PR for that? Thanks.

sadikovi commented 5 years ago

Yes, no problem. I think it would be good to mark this repo as moved or something similar.

sunchao commented 5 years ago

Yes. Will update the README for the merge.