Closed jaychia closed 1 year ago
Patch coverage: 95.23%
and project coverage change: +0.02%
:tada:
Comparison is base (
87ab844
) 83.02% compared to head (4e4279f
) 83.05%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @ritchie46 and @sundy-li
Here's a follow-up PR to https://github.com/jorgecarleitao/arrow2/pull/1532
(see PR description for more details)
BTW, int96
seems to be deprecated in parquet, it's not a stable feature. https://[issues.apache.org/jira/browse/PARQUET-323](https://issues.apache.org/jira/browse/PARQUET-323)
BTW,
int96
seems to be deprecated in parquet, it's not a stable feature. https://[issues.apache.org/jira/browse/PARQUET-323](https://issues.apache.org/jira/browse/PARQUET-323)
Indeed, but it is still widely used and supported by many systems for backwards-compatibility reasons
Unfortunately because Parquet is a long-lived format, and many enterprises use old versions of data frameworks, these deprecated features tend to live long after their deprecation :)
This PR addresses part 2 of #1527
It solves the problem of configuring arrow2's Parquet schema inference to infer Timestamp fields from Parquet Int96 fields differently based on user input.
SchemaInferenceOptions
struct which allows for configurability of how schema inference on Parquet filesint96_coerce_to_timeunit
flag to configure how Parquet int96 fields are inferred as arrow Timestamps*_with_options
variants of theinfer_schema
andparquet_to_arrow_schema
APIs to take in the options