segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 58 forks source link

optimize DELTA_BYTE_ARRAY decoding #294

Closed achille-roussel closed 2 years ago

achille-roussel commented 2 years ago

Follow up to https://github.com/segmentio/parquet-go/pull/289, this PR optimizes the decoding of DELTA_BYTE_ARRAY pages.

An interesting consequence of changing the in-memory representation of byte array values is we are able to merge the decodeByteArrayAVX2 and decodeFixedLenByteArrayAVX2 functions into one since the memory layout is now the same for variable and fixed length values. Less code to maintain, less opportunities for bugs, and more efficiency!

name                                old time/op   new time/op    delta
Decode/DELTA_BYTE_ARRAY/byte_array    131µs ± 0%      37µs ± 0%   -71.54%  (p=0.000 n=10+10)

name                                old speed     new speed      delta
Decode/DELTA_BYTE_ARRAY/byte_array  831MB/s ± 0%  2918MB/s ± 0%  +251.36%  (p=0.000 n=10+10)

name                                old value/s   new value/s    delta
Decode/DELTA_BYTE_ARRAY/byte_array    76.4M ± 0%    268.3M ± 0%  +251.35%  (p=0.000 n=10+10)