jorgecarleitao / parquet2

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Other
356 stars 59 forks source link

Added `serde` support for `RowGroupMetaData`. #202

Closed youngsofun closed 1 year ago

youngsofun commented 2 years ago

use thrift for types in parquet_format_safe.

https://github.com/jorgecarleitao/parquet2/issues/200

it turns out that there are as many as 16 types that need to impl De/Serialize, so it is very troublesome to work it around outside.

codecov-commenter commented 2 years ago

Codecov Report

Base: 85.69% // Head: 85.69% // No change to project coverage :thumbsup:

Coverage data is based on head (b1678c9) compared to base (21a7f98). Patch has no changes to coverable lines.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #202 +/- ## ======================================= Coverage 85.69% 85.69% ======================================= Files 84 84 Lines 8254 8254 ======================================= Hits 7073 7073 Misses 1181 1181 ``` | [Impacted Files](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao) | Coverage Δ | | |---|---|---| | [src/metadata/column\_chunk\_metadata.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL21ldGFkYXRhL2NvbHVtbl9jaHVua19tZXRhZGF0YS5ycw==) | `69.51% <ø> (ø)` | | | [src/metadata/column\_descriptor.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL21ldGFkYXRhL2NvbHVtbl9kZXNjcmlwdG9yLnJz) | `100.00% <ø> (ø)` | | | [src/metadata/column\_order.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL21ldGFkYXRhL2NvbHVtbl9vcmRlci5ycw==) | `0.00% <ø> (ø)` | | | [src/metadata/row\_metadata.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL21ldGFkYXRhL3Jvd19tZXRhZGF0YS5ycw==) | `40.00% <ø> (ø)` | | | [src/metadata/schema\_descriptor.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL21ldGFkYXRhL3NjaGVtYV9kZXNjcmlwdG9yLnJz) | `91.39% <ø> (ø)` | | | [src/metadata/sort.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL21ldGFkYXRhL3NvcnQucnM=) | `54.54% <ø> (ø)` | | | [src/parquet\_bridge.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL3BhcnF1ZXRfYnJpZGdlLnJz) | `78.90% <ø> (ø)` | | | [src/schema/types/basic\_type.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL3NjaGVtYS90eXBlcy9iYXNpY190eXBlLnJz) | `100.00% <ø> (ø)` | | | [src/schema/types/converted\_type.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL3NjaGVtYS90eXBlcy9jb252ZXJ0ZWRfdHlwZS5ycw==) | `93.87% <ø> (ø)` | | | [src/schema/types/parquet\_type.rs](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL3NjaGVtYS90eXBlcy9wYXJxdWV0X3R5cGUucnM=) | `58.73% <ø> (ø)` | | | ... and [1 more](https://codecov.io/gh/jorgecarleitao/parquet2/pull/202/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

jorgecarleitao commented 2 years ago

This would add a significant dependency to this crate. Would it make sense to make this an optional feature under serde. Something like this: https://github.com/jorgecarleitao/arrow2/blob/main/src/datatypes/field.rs#L3

youngsofun commented 2 years ago

@jorgecarleitao refactor done, it is a feature now.

youngsofun commented 2 years ago

@jorgecarleitao Is there anything else needs to improve?

jorgecarleitao commented 1 year ago

Awesome, thanks a lot!

youngsofun commented 1 year ago

Is it ready to bump to 0.16.4? @jorgecarleitao

jorgecarleitao commented 1 year ago

We need to bump to 0.17 - there are breaking changes in main. Released :)