julianpeeters / avrohugger

Generate Scala case class definitions from Avro schemas
Apache License 2.0
201 stars 120 forks source link
avro scala

avrohugger

Schema-to-case-class code generation for working with Avro in Scala.

Alternative Distributions:

Table of contents

Generates Scala case classes in various formats:
Supports generating case classes with arbitrary fields of the following datatypes:
Avro Standard SpecificRecord Notes
INT Int Int See Logical Types: date
LONG Long Long See Logical Types: timestamp-millis
FLOAT Float Float
DOUBLE Double Double
STRING String String
BOOLEAN Boolean Boolean
NULL Null Null
MAP Map Map
ENUM scala.Enumeration
Scala case object
Java Enum
EnumAsScalaString
Java Enum
EnumAsScalaString
See Customizable Type Mapping
BYTES Array[Byte]
BigDecimal
Array[Byte]
BigDecimal
See Logical Types: decimal
FIXED case class
case class + schema
case class extending SpecificFixed See Logical Types: decimal
ARRAY Seq
List
Array
Vector
Seq
List
Array
Vector
See Customizable Type Mapping
UNION Option
Either
Shapeless Coproduct
Option
Either
Shapeless Coproduct
See Customizable Type Mapping
RECORD case class
case class + schema
case class extending SpecificRecordBase See Customizable Type Mapping
PROTOCOL No Type
Scala ADT
RPC trait
Scala ADT
See Customizable Type Mapping
Date java.time.LocalDate
java.sql.Date
Int
java.time.LocalDate
java.sql.Date
Int
See Customizable Type Mapping
TimeMillis java.time.LocalTime
Int
java.time.LocalTime
Int
See Customizable Type Mapping
TimeMicros java.time.LocalTime
Long
java.time.LocalTime
Long
See Customizable Type Mapping
TimestampMillis java.time.Instant
java.sql.Timestamp
Long
java.time.Instant
java.sql.Timestamp
Long
See Customizable Type Mapping
TimestampMicros java.time.Instant
java.sql.Timestamp
Long
java.time.Instant
java.sql.Timestamp
Long
See Customizable Type Mapping
LocalTimestampMillis java.time.LocalDateTime
Long
java.time.LocalDateTime
Long
See Customizable Type Mapping
LocalTimestampMicros java.time.LocalDateTime
Long
java.time.LocalDateTime
Long
See Customizable Type Mapping
UUID java.util.UUID java.util.UUID See Customizable Type Mapping
Decimal BigDecimal BigDecimal See Customizable Type Mapping
Logical Types Support:

NOTE: Currently logical types are only supported for Standard and SpecificRecord formats

Protocol Support:
Doc Support:

Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).

Usage

avrohugger-core

Get the dependency with:
"com.julianpeeters" %% "avrohugger-core" % "2.8.3"
Description:

Instantiate a Generator with Standard or SpecificRecord source formats. Then use

tToFile(input: T, outputDir: String): Unit

or

tToStrings(input: T): List[String]

where T can be File, Schema, or String.

Example
import avrohugger.Generator
import avrohugger.format.SpecificRecord
import java.io.File

val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"

where an input File can be .avro, .avsc, .avpr, or .avdl,

and where an input String can be the string representation of an Avro schema, protocol, IDL, or a set of case classes that you'd like to have implement SpecificRecordBase.

Customizable Type Mapping:

To reassign Scala types to Avro types, use the following (e.g. for customizing Specific):

import avrohugger.format.SpecificRecord
import avrohugger.types.ScalaVector

val myScalaTypes = Some(SpecificRecord.defaultTypes.copy(array = ScalaVector))
val generator = new Generator(SpecificRecord, avroScalaCustomTypes = myScalaTypes)

Specifically for unions:

Field Type ⬇️ / Behaviour ➡️ OptionShapelessCoproduct OptionEitherShapelessCoproduct OptionalShapelessCoproduct
[{"type": "map", "values": "string"}] Map[String, String] Map[String, String] Map[String, String] :+: CNil
["null", "double"] Option[Double] Option[Double] Option[Double :+: CNil]
["int", "string"] Int :+: String :+: CNil Either[Int, String] Int :+: String :+: CNil
["null", "int", "string"] Option[Int :+: String :+: CNil] Option[Either[Int, String]] Option[Int :+: String :+: CNil]
["boolean", "int", "string"] Boolean :+: Int :+: String :+: CNil Boolean :+: Int :+: String :+: CNil Boolean :+: Int :+: String :+: CNil
["null", "boolean", "int", "string"] Option[Boolean :+: Int :+: String :+: CNil] Option[Boolean :+: Int :+: String :+: CNil] Option[Boolean :+: Int :+: String :+: CNil]
Customizable Namespace Mapping:

Namespaces can be reassigned by instantiating a Generator with a custom namespace map:

val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))

Note: Namespace mappings work for with KafkaAvroSerializer but not for KafkaAvroDeserializer; if anyone knows how to configure the deserializer to map incoming schema names to target class names please speak up!

Wildcarding the beginning of a namespace is permitted, place a single asterisk after the prefix that you want to map and any matching schema will have its namespace rewritten. Multiple conflicting wildcards are not permitted.

val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("example.*"->"example.newnamespace"))

avrohugger-filesorter

Get the dependency with:
"com.julianpeeters" %% "avrohugger-filesorter" % "2.8.3"
Description:

To ensure dependent schemas are compiled in the proper order (thus avoiding org.apache.avro.SchemaParseException: Undefined name: "com.example.MyRecord" parser errors), sort avsc and avdl files with the sortSchemaFiles method on AvscFileSorter and AvdlFileSorterrespectively.

Example:
import avrohugger.filesorter.AvscFileSorter
import java.io.File

val sorted: List[File] = AvscFileSorter.sortSchemaFiles((srcDir ** "*.avsc"))

avrohugger-tools

Download the avrohugger-tools jar for Scala 2.12, or Scala 2.13 (>30MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir:

java -jar /path/to/avrohugger-tools_2.12-2.8.3-assembly.jar generate schema user.avsc .

java -jar /path/to/avrohugger-tools_2.12-2.8.3-assembly.jar generate-specific schema user.avsc .

Warnings

1) If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (e.g. val sdw = SpecificDatumWriter[MyRecord](schema)).

2) For the SpecificRecord format, generated case class fields must be mutable (var) in order to be compatible with the SpecificRecord API. Note: If your framework allows GenericRecord, avro4s provides a type class that converts to and from immutable case classes cleanly.

3) SpecificRecord requires that enum be represented as JavaEnum

Testing

To test for regressions, please run sbt:avrohugger> + test.

To test that generated code can be de/serialized as expected, please run: 1) sbt:avrohugger> + publishLocal 2) then clone sbt-avrohugger and update its avrohugger dependency to the locally published version 3) finally run sbt:sbt-avrohugger> scripted avrohugger/*, or, e.g., scripted avrohugger/GenericSerializationTests

Credits

Depends on Avro and Treehugger. avrohugger-tools is based on avro-tools.

Contributors:

Marius Soutier
Brian London
alancnet
Matt Coffin
Ryan Koval
Simonas Gelazevicius
Paul Snively
Marco Stefani
Andrew Gustafson
Kostya Golikov
Plínio Pantaleão
Sietse de Kaper
Martin Mauch
Leon Poon
Paul Pearcy
Matt Allen
C-zito
Tim Chan
Saket
Daniel Davis
Zach Cox
Diego E. Alonso Blas
Fede Fernández
Rob Landers
Simon Petty
Andreas Drobisch
Timo Schmid
Dmytro Orlov
Stefano Galarraga
Lars Albertsson
Eugene Platonov
Jerome Wacongne
Jon Morra
Raúl Raja Martínez
Kaur Matas
Chris Albright
Francisco Díaz
Bobby Rauchenberg
Leonard Ehrenfried
François Sarradin
niqdev
rsitze-mmai
Julien BENOIT
Adam Drakeford
Carlos Silva
ismail Benammar
mcenkar
Luca Tronchin
LydiaSkuse
Algimantas Milašius
Leonard Ehrenfried
Massimo Siani
Konstantin
natefitzgerald
Victor
steve-e
Criticism is appreciated.
Fork away, just make sure the tests pass before sending a pull request.