Closed clairemcginty closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.50%. Comparing base (
a3708ba
) to head (bded9b4
). Report is 4 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Adds benchmarks for Parquet read/write performance, for both magnolify-parquet and parquet-avro (although we don't own parquet-avro, it's helpful to compare against IMO).
Parquet is a little tricky in that it doesn't have a granular "write/read a single record to/from a file" operation due to its complex file structure/encodings. This benchmark sets up an in-memory page store that can can read or write Parquet "groups", which are Parquet's internal record structure. Read/write is invoked with a record type
T
and a matchingRecordConverter[T]
, which converts either case classes (magnolify-parquet) or Avro records (parquet-avro) into Parquet groups. Thus, what we're benchmarking here is Group-to-record and record-to-Group conversion, which is the core functionality of magnolify-parquet 👍Results (run locally w 64GB M1 mac + OpenJDK 17.0.5):