spotify / magnolify

A collection of Magnolia add-on modules
https://spotify.github.io/magnolify
Apache License 2.0
168 stars 26 forks source link

Add benchmarks for magnolify-parquet vs parquet-avro R/W #1040

Closed clairemcginty closed 2 months ago

clairemcginty commented 2 months ago

Adds benchmarks for Parquet read/write performance, for both magnolify-parquet and parquet-avro (although we don't own parquet-avro, it's helpful to compare against IMO).

Parquet is a little tricky in that it doesn't have a granular "write/read a single record to/from a file" operation due to its complex file structure/encodings. This benchmark sets up an in-memory page store that can can read or write Parquet "groups", which are Parquet's internal record structure. Read/write is invoked with a record type T and a matching RecordConverter[T], which converts either case classes (magnolify-parquet) or Avro records (parquet-avro) into Parquet groups. Thus, what we're benchmarking here is Group-to-record and record-to-Group conversion, which is the core functionality of magnolify-parquet 👍

Results (run locally w 64GB M1 mac + OpenJDK 17.0.5):

% sbt "jmh/jmh:run -i 10 -wi 10 -f1 -t .*parquet.*"
[info] Benchmark                           Mode  Cnt      Score     Error  Units
[info] ParquetBench.parquetReadAvro       avgt   10  12693.357 ± 208.175  ns/op
[info] ParquetBench.parquetReadMagnolify  avgt   10  13695.172 ± 311.972  ns/op
[info] ParquetBench.parquetWriteAvro       avgt   10  9621.541 ±   81.569  ns/op
[info] ParquetBench.parquetWriteMagnolify  avgt   10  5527.228 ± 70.377  ns/op
codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.50%. Comparing base (a3708ba) to head (bded9b4). Report is 4 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1040 +/- ## ======================================= Coverage 95.50% 95.50% ======================================= Files 56 56 Lines 1980 1980 Branches 186 186 ======================================= Hits 1891 1891 Misses 89 89 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.