segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 58 forks source link

use memory allocator in byte array dictionary #445

Closed achille-roussel closed 1 year ago

achille-roussel commented 1 year ago

Indexing BYTE_ARRAY columns is a frequent use case and when done at high throughput, the creation of string copies to capture the dictionary keys would result in excessive time spent in memory allocations.

This PR addresses the issue by using a local allocator in the byteArrayDictionary type to amortize the cost of memory allocations for the dictionary keys.