Closed jon-morra-zefr closed 6 years ago
Yeah, that makes sense. avroScalaCustomEnumStyle
could have a setting like ("enum" -> "string")
. Unless somebody beats me to it, I'll have a better idea of my timeframe in about a month or so.
@julianpeeters I'll take this on today. Hopefully I can have a PR by the end of the long weekend.
@julianpeeters: I didn't have a chance to take this on. In the interest of time, I decided to rewrite enums in avro files as string-based variables and use avrohugger, unmodified, to emit the Scala data types. Do you think that you'll be able to take a look at this in a month or so as indicated above?
@ryan-deak-zefr no prob at all. Yes, that's looking like the right time frame for me.
This makes sense, and would be nice to have. spark-avro also maps enum to string https://github.com/databricks/spark-avro This change will make this library a lot more useful for spark users.
Agreed. We are spark users and do some conditional SBT stuff to get around this. I wanted to do it directly in avrohugger but I had trouble finding the time.
On Dec 20, 2017, at 8:37 AM, Johnny Cao notifications@github.com wrote:
This makes sense, and would be nice to have. spark-avro also maps enum to string https://github.com/databricks/spark-avro https://github.com/databricks/spark-avro This change will make this library a lot more useful for spark users.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/julianpeeters/avrohugger/issues/71#issuecomment-353113956, or mute the thread https://github.com/notifications/unsubscribe-auth/AYgRAWdfwFEh9BL_9ojDx6uF-_ygRUJ-ks5tCTesgaJpZM4OJ_4r.
Howdy Gang,
I finally found some time to implement this and add some thorough tests. If you'd like to kick the tires, please try avrohugger version 1.0.0-RC2 or sbt-avrohugger 2.0.0-RC2. Here's an example of how it is used.
Spark does not have a native representation of enums. This makes using avrohugger with Spark very difficult when the avro type has enums in it. The easiest way around this is to add the ability for avrohugger to read in enum types from the avro schema and generate these types as String. While I recognize this is not an accurate mapping of the supplied type, this will make loading typed data into Spark a LOT easier.