segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 58 forks source link

fix issue 254 #265

Closed achille-roussel closed 2 years ago

achille-roussel commented 2 years ago

Fixes #254

When a page boundary was crossed while loading column values, the page buffer being recycled via a sync.Pool was causing values loaded for other columns to be invalidated even tho the whole row had not yet been returned to the application.

This PR may introduce a performance regression on the read paths when rows contain values of type BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY as we will now be making a copy of these instead of taking references to the underlying page buffer. This will be addressed in a follow up.

mdisibio commented 2 years ago

Can confirm this fixes my roundtrip testing which surfaced the issue originally.