Open shnapz opened 1 year ago
I was curious about performance differences between Magnolify and Scio Avro macros, so I set up a quick JMH test to compute average runtimes of Avro macro operations.
all jmh tests were run with 10 iterations, 10 warmup iterations, 1 fork, 1 thread on my m1 mac, java 11.
Method | magnolify.avro.AvroType | com.spotify.scio.avro.types.AvroType |
---|---|---|
to (CC => GenericRecord) | 1,872.553 ns/op | 241,657.438 ns/op |
from (GenericRecord => CC) | 848.955 ns/op | 1,037.100 ns/op |
schema | 2.820 ns/op | 2,994.886 ns/op |
The Scio AvroType#to
figure is not a typo 😬 would appreciate a second pair of eyes on the jmh test setup to rule out user code error, but yeah - scary, and extra motivation for us to migrate!
IMO we should deprecate Scio's AvroType in the next release and add read/write methods to scio-avro that support the type bound T : magnolify.AvroType
. It would be ideal if we could overload data.saveAsTypedAvroFile
/sc.typedAvroFile
to have a separate signature supporting T <: HasAvroAnnotation
and T: magnolify.AvroType
, so that it would be an easier migration on the user side, but I'm not sure if the compiler will complain.
As a followup, I was curious about BQ macro performance for Magnolify vs vanilla Scio, so I ran those too (same run parameters as for avro):
Method | magnolify.bigquery.TableRowType | com.spotify.scio.bigquery.types.BigQueryType |
---|---|---|
to (CC => TableRow) | 19,622.473 ns/op | 33,009.115 ns/op |
from (TableRow => CC) | 15,935.708 ns/op | 15,213.949 ns/op |
schema | 2.808 ns/op | 778.948 ns/op |
the numbers here suggest that there may be some room for improvement on the Magnolify side 🤔
Currently scio-avro uses
com.spotify.scio.avro.types.AvroType[T]
that implements "case class <-> GenericRecord" codec usingscala.reflect.macros._
incom.spotify.scio.avro.types.TypeProvider
. This was implemented in 2017-2019.magnolify.avro.AvroType[T]
.