Move scio-avro to magnolify-avro

shnapz commented 1 year ago

Currently scio-avro uses com.spotify.scio.avro.types.AvroType[T] that implements "case class <-> GenericRecord" codec using scala.reflect.macros._ in com.spotify.scio.avro.types.TypeProvider. This was implemented in 2017-2019.

Move AvroIO to a newer implementation based on magnolify-avro using magnolify.avro.AvroType[T].
Identify potential compatibility issues for users

clairemcginty commented 1 year ago

I was curious about performance differences between Magnolify and Scio Avro macros, so I set up a quick JMH test to compute average runtimes of Avro macro operations.

all jmh tests were run with 10 iterations, 10 warmup iterations, 1 fork, 1 thread on my m1 mac, java 11.

Method	magnolify.avro.AvroType	com.spotify.scio.avro.types.AvroType
to (CC => GenericRecord)	1,872.553 ns/op	241,657.438 ns/op
from (GenericRecord => CC)	848.955 ns/op	1,037.100 ns/op
schema	2.820 ns/op	2,994.886 ns/op

The Scio AvroType#to figure is not a typo 😬 would appreciate a second pair of eyes on the jmh test setup to rule out user code error, but yeah - scary, and extra motivation for us to migrate!

IMO we should deprecate Scio's AvroType in the next release and add read/write methods to scio-avro that support the type bound T : magnolify.AvroType. It would be ideal if we could overload data.saveAsTypedAvroFile/sc.typedAvroFile to have a separate signature supporting T <: HasAvroAnnotation and T: magnolify.AvroType, so that it would be an easier migration on the user side, but I'm not sure if the compiler will complain.

clairemcginty commented 1 year ago

As a followup, I was curious about BQ macro performance for Magnolify vs vanilla Scio, so I ran those too (same run parameters as for avro):

Method	magnolify.bigquery.TableRowType	com.spotify.scio.bigquery.types.BigQueryType
to (CC => TableRow)	19,622.473 ns/op	33,009.115 ns/op
from (TableRow => CC)	15,935.708 ns/op	15,213.949 ns/op
schema	2.808 ns/op	778.948 ns/op

the numbers here suggest that there may be some room for improvement on the Magnolify side 🤔

spotify / scio

Move scio-avro to magnolify-avro #4540