segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 58 forks source link

optimize DELTA_LENGTH_BYTE_ARRAY decoding #291

Closed achille-roussel closed 2 years ago

achille-roussel commented 2 years ago

Follow up to #289, this PR optimizes the decoding of DELTA_LENGTH_BYTE_ARRAY pages.

We get ~2x better throughput with this change compared to the parent branch:

name                                       old time/op    new time/op    delta
Decode/DELTA_LENGTH_BYTE_ARRAY/byte_array    24.0µs ± 0%    12.0µs ± 0%   -50.07%  (p=0.000 n=10+10)

name                                       old speed      new speed      delta
Decode/DELTA_LENGTH_BYTE_ARRAY/byte_array  4.53GB/s ± 0%  9.07GB/s ± 0%  +100.28%  (p=0.000 n=10+10)

name                                       old value/s    new value/s    delta
Decode/DELTA_LENGTH_BYTE_ARRAY/byte_array      416M ± 0%      834M ± 0%  +100.29%  (p=0.000 n=10+10)