sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

[WIP] Write support #127

Closed sadikovi closed 6 years ago

sadikovi commented 6 years ago

This is WIP, to test and discuss the API and implementation.

Once design has been approved, I will split work into small chunks and submit them separately.

PR adds low-level write support with the following features:

Limitations:

Testing:

coveralls commented 6 years ago

Pull Request Test Coverage Report for Build 535


Files with Coverage Reduction New Missed Lines %
encodings/encoding.rs 1 94.64%
errors.rs 2 25.0%
file/reader.rs 13 96.64%
util/io.rs 28 60.0%
column/page.rs 39 61.76%
schema/types.rs 72 90.41%
file/metadata.rs 154 45.82%
<!-- Total: 309 -->
Totals Coverage Status
Change from base Build 533: -8.1%
Covered Lines: 10473
Relevant Lines: 12023

💛 - Coveralls
sunchao commented 6 years ago

Thanks @sadikovi ! Very nice work 👍 . I'll try to allocate some time to review this soon.

sadikovi commented 6 years ago

It would be good to check the copy/clone calls that I make in the code.

sadikovi commented 6 years ago

I think it might be worth spending some time and building the code to write Row into Parquet, so we can test write-read roundtrip.

sunchao commented 6 years ago

Hi @sadikovi . Sorry for the delay. I think the WIP PR looks great! Could you split it up into several sub-PRs and submit them for separate review?

It would be good to check the copy/clone calls that I make in the code.

Will check this in the sub-PRs.

I think it might be worth spending some time and building the code to write Row into Parquet, so we can test write-read roundtrip.

Hmm not sure if this is necessary, since we already have encoder level round-trip tests. Also, we can do round-trip test for column reader/writer - will that be enough?

sadikovi commented 6 years ago

Thanks. I will outline and link issues on what needs to be completed. Yes, column reading/writing should be enough.