jorgecarleitao / arrow2

Transmute-free Rust library to work with the Arrow format
Apache License 2.0
1.07k stars 220 forks source link

Correctly coerce Parquet Int96 timestamps into requested TimeUnits #1532

Closed jaychia closed 11 months ago

jaychia commented 11 months ago

Addresses the first part of issue #1527

Instead of always naively parsing Parquet Int96 timestamps into TimeUnit::Nanosecond, we match on the requested timeunit and perform timeunit-specific parsing

This makes parsing safer when reading Int96 timestamps that are outside of the range of timestamp[ns] (e.g. timestamps with dates like the years 1000 or 3000) instead of the current behavior which is to always parse the timestamps with the Nanosecond timeunit, and overflow.

jaychia commented 11 months ago

Also happy to move the int96_to_i64_*s logic over into the parquet2 repo!

ritchie46 commented 11 months ago

Also happy to move the int96_toi64*s

I believe that all arrow datatype related conversions are done here, so that is fine.

codecov[bot] commented 11 months ago

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.66% :warning:

Comparison is base (b09e580) 83.73% compared to head (b749735) 83.07%. Report is 49 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1532 +/- ## ========================================== - Coverage 83.73% 83.07% -0.66% ========================================== Files 389 389 Lines 41811 42632 +821 ========================================== + Hits 35009 35417 +408 - Misses 6802 7215 +413 ``` | [Files Changed](https://app.codecov.io/gh/jorgecarleitao/arrow2/pull/1532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao) | Coverage Δ | | |---|---|---| | [src/io/parquet/read/deserialize/simple.rs](https://app.codecov.io/gh/jorgecarleitao/arrow2/pull/1532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL2lvL3BhcnF1ZXQvcmVhZC9kZXNlcmlhbGl6ZS9zaW1wbGUucnM=) | `86.27% <100.00%> (+1.76%)` | :arrow_up: | ... and [54 files with indirect coverage changes](https://app.codecov.io/gh/jorgecarleitao/arrow2/pull/1532/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jaychia commented 11 months ago

Second PR is up! #1533

jaychia commented 11 months ago

@ritchie46 any chance you could retrigger CI - Seems to be failing on some flaky token issues?

{'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
jaychia commented 11 months ago

Thanks for the stamp @sundy-li!

I had to make a new PR to fix lints since they were re-enabled last week. Would appreciate some help with launching CI :)

sundy-li commented 11 months ago

Merged, will create a new pr to fix the clippy.