julianpeeters / avrohugger

Generate Scala case class definitions from Avro schemas
Apache License 2.0
202 stars 120 forks source link

Output avro enum as Scala String #71

Closed jon-morra-zefr closed 6 years ago

jon-morra-zefr commented 7 years ago

Spark does not have a native representation of enums. This makes using avrohugger with Spark very difficult when the avro type has enums in it. The easiest way around this is to add the ability for avrohugger to read in enum types from the avro schema and generate these types as String. While I recognize this is not an accurate mapping of the supplied type, this will make loading typed data into Spark a LOT easier.

julianpeeters commented 7 years ago

Yeah, that makes sense. avroScalaCustomEnumStyle could have a setting like ("enum" -> "string"). Unless somebody beats me to it, I'll have a better idea of my timeframe in about a month or so.

ryan-deak-zefr commented 7 years ago

@julianpeeters I'll take this on today. Hopefully I can have a PR by the end of the long weekend.

ryan-deak-zefr commented 7 years ago

@julianpeeters: I didn't have a chance to take this on. In the interest of time, I decided to rewrite enums in avro files as string-based variables and use avrohugger, unmodified, to emit the Scala data types. Do you think that you'll be able to take a look at this in a month or so as indicated above?

julianpeeters commented 7 years ago

@ryan-deak-zefr no prob at all. Yes, that's looking like the right time frame for me.

johnnycaol commented 6 years ago

This makes sense, and would be nice to have. spark-avro also maps enum to string https://github.com/databricks/spark-avro This change will make this library a lot more useful for spark users.

ryan-deak-zefr commented 6 years ago

Agreed. We are spark users and do some conditional SBT stuff to get around this. I wanted to do it directly in avrohugger but I had trouble finding the time.

On Dec 20, 2017, at 8:37 AM, Johnny Cao notifications@github.com wrote:

This makes sense, and would be nice to have. spark-avro also maps enum to string https://github.com/databricks/spark-avro https://github.com/databricks/spark-avro This change will make this library a lot more useful for spark users.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/julianpeeters/avrohugger/issues/71#issuecomment-353113956, or mute the thread https://github.com/notifications/unsubscribe-auth/AYgRAWdfwFEh9BL_9ojDx6uF-_ygRUJ-ks5tCTesgaJpZM4OJ_4r.

julianpeeters commented 6 years ago

Howdy Gang,

I finally found some time to implement this and add some thorough tests. If you'd like to kick the tires, please try avrohugger version 1.0.0-RC2 or sbt-avrohugger 2.0.0-RC2. Here's an example of how it is used.