segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

Support reading int96 as timestamp #506

Open chelseajonesr opened 1 year ago

chelseajonesr commented 1 year ago

By default Spark writes parquet files with int96 timestamps, even though the type has been deprecated for a while. There are details of the format and some discussion here: https://github.com/apache/parquet-format/pull/49

This PR adds support for reading int96 values into timestamps. (Just reading, not writing.)

During implementation I found that the convertToType() function in convert.go was converting in the wrong direction. I've added a test scenario "string to int" in convert_test.go which fails with the existing code and succeeds with the updated version. Also, converting types with null values failed in some situations, so I added null checks to the ConvertValue functions in type.go and added the "nils" scenario in convert_test.go.

kevinburkesegment commented 1 year ago

Apologies to make more work for you, but we've decided to move development on this project to a new organization at https://github.com/parquet-go/parquet-go to ensure its long term success. We appreciate your contribution and would appreciate if you could reopen this PR there if it is still relevant.