sunchao / parquet-rs

Apache Parquet implementation in Rust
Apache License 2.0
149 stars 20 forks source link

Arrow schema converter. #185

Closed liurenjie1024 closed 5 years ago

liurenjie1024 commented 5 years ago

This is the first step of adding an arrow reader and writer for parquet-rs. This commit contains a converter which converts parquet schema to arrow schema.

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 665


Files with Coverage Reduction New Missed Lines %
encodings/encoding.rs 1 94.36%
file/properties.rs 4 90.97%
errors.rs 4 26.67%
schema/types.rs 18 96.84%
<!-- Total: 27 -->
Totals Coverage Status
Change from base Build 661: -0.09%
Covered Lines: 13276
Relevant Lines: 13883

💛 - Coveralls
sunchao commented 5 years ago

Thanks @liurenjie1024 . Will take a look soon.

Meanwhile, I created #186 to track the overall progress of adding Arrow support. Please feel free to create more tasks in it. I'm also planning to take up some tasks. :)

liurenjie1024 commented 5 years ago

@sunchao Thanks for opening the issue and comment with my thoughts about other tasks. Current I'm interested in the reader part and working on the reader that converts parquet to arrow.

liurenjie1024 commented 5 years ago

@sunchao Thanks for the review and I've fixed the comments. As for the format part, how do you think about a rustfmt file so that other contributors can follow?

sunchao commented 5 years ago

As for the format part, how do you think about a rustfmt file so that other contributors can follow?

Yes I think it is a good idea. We discussed about this before but later give up since couldn't achieve what we want with rustfmt. Let me try it once more.

liurenjie1024 commented 5 years ago

@sadikovi Sorry for late reply. I've fixed some comments.

liurenjie1024 commented 5 years ago

@sunchao @sadikovi Could you help to review this?

sunchao commented 5 years ago

@liurenjie1024 sure, will take a look soon. Sorry for the delay.

liurenjie1024 commented 5 years ago

@sunchao @sadikovi We have two LGTM, help to merge this?

sunchao commented 5 years ago

@liurenjie1024 I'm in the process of donating parquet-rs to Apache arrow, and think it might be better to do this after the merge is done (we can do a clean start of the arrow-parquet integration after that). I can merge this now too if you have follow-up work that depends on it. It shouldn't matter that much. What do you think?

liurenjie1024 commented 5 years ago

Yes, I think it would be better to merge it after merging with arrow.

sunchao commented 5 years ago

Great. We can create an umbrella JIRA after the merge, to track the parquet-arrow integration. I can help to get this PR in very soon after that.

liurenjie1024 commented 5 years ago

Cool.

sunchao commented 5 years ago

@liurenjie1024 I created a JIRA here: https://issues.apache.org/jira/browse/ARROW-4060. Could you file a PR in the arrow repo for this? Thanks.

sunchao commented 5 years ago

Closing this as ARROW-4060 is already merged into Arrow.