avrohugger

Schema-to-case-class code generation for working with Avro in Scala.

avrohugger-core: Generate source code at runtime for evaluation at a later step.
avrohugger-filesorter: Sort schema files for proper compilation order.
avrohugger-tools: Generate source code at the command line with the avrohugger-tools jar.

Alternative Distributions:

sbt: sbt-avrohugger - Generate source code at compile time with an sbt plugin.
Maven: avrohugger-maven-plugin - Generate source code at compile time with a maven plugin.
Mill: mill-avro - Generate source code at compile time with a Mill plugin.
Gradle: gradle-avrohugger-plugin - Generate source code at compile time with a gradle plugin.
mu-rpc: mu-scala - Generate rpc models, messages, clients, and servers.

Supported Formats: Standard, SpecificRecord
Supported Datatypes
Logical Types Support
Protocol Support
Doc Support
Usage
Warnings
Best Practices
Testing
Credits

Generates Scala case classes in various formats:

Standard Vanilla case classes (for use with Apache Avro's GenericRecord API, etc.)
SpecificRecord Case classes that implement SpecificRecordBase and therefore have mutable var fields (for use with the Avro Specific API - Scalding, Spark, Avro, etc.).

Supports generating case classes with arbitrary fields of the following datatypes:

Avro	`Standard`	`SpecificRecord`	Notes
INT	Int	Int	See Logical Types: `date`
LONG	Long	Long	See Logical Types: `timestamp-millis`
FLOAT	Float	Float
DOUBLE	Double	Double
STRING	String	String
BOOLEAN	Boolean	Boolean
NULL	Null	Null
MAP	Map	Map
ENUM	scala.Enumeration Scala case object Java Enum EnumAsScalaString	Java Enum EnumAsScalaString	See Customizable Type Mapping
BYTES	Array[Byte] BigDecimal	Array[Byte] BigDecimal	See Logical Types: `decimal`
FIXED	case class case class + schema	case class extending `SpecificFixed`	See Logical Types: `decimal`
ARRAY	Seq List Array Vector	Seq List Array Vector	See Customizable Type Mapping
UNION	Option Either Shapeless Coproduct	Option Either Shapeless Coproduct	See Customizable Type Mapping
RECORD	case class case class + schema	case class extending `SpecificRecordBase`	See Customizable Type Mapping
PROTOCOL	No Type Scala ADT	RPC trait Scala ADT	See Customizable Type Mapping
Date	java.time.LocalDate java.sql.Date Int	java.time.LocalDate java.sql.Date Int	See Customizable Type Mapping
TimeMillis	java.time.LocalTime Int	java.time.LocalTime Int	See Customizable Type Mapping
TimeMicros	java.time.LocalTime Long	java.time.LocalTime Long	See Customizable Type Mapping
TimestampMillis	java.time.Instant java.sql.Timestamp Long	java.time.Instant java.sql.Timestamp Long	See Customizable Type Mapping
TimestampMicros	java.time.Instant java.sql.Timestamp Long	java.time.Instant java.sql.Timestamp Long	See Customizable Type Mapping
LocalTimestampMillis	java.time.LocalDateTime Long	java.time.LocalDateTime Long	See Customizable Type Mapping
LocalTimestampMicros	java.time.LocalDateTime Long	java.time.LocalDateTime Long	See Customizable Type Mapping
UUID	java.util.UUID	java.util.UUID	See Customizable Type Mapping
Decimal	BigDecimal	BigDecimal	See Customizable Type Mapping

Logical Types Support:

NOTE: Currently logical types are only supported for Standard and SpecificRecord formats

date: Annotates Avro int schemas to generate java.time.LocalDate or java.sql.Date (See Customizable Type Mapping). Examples: avdl, avsc.
decimal: Annotates Avro bytes and fixed schemas to generate BigDecimal. Examples: avdl, avsc.
timestamp-millis: Annotates Avro long schemas to genarate java.time.Instant or java.sql.Timestamp or long (See Customizable Type Mapping). Examples: avdl, avsc.
uuid: Annotates Avro string schemas and idls to generate java.util.UUID (See Customizable Type Mapping). Example: avsc.
time-millis: Annotates Avro int schemas to genarate java.time.LocalTime or java.sql.Time or int

Protocol Support:

the records defined in .avdl, .avpr, and json protocol strings can be generated as ADTs if the protocols define more than one Scala definition (note: message definitions are ignored when this setting is used). See Customizable Type Mapping.
For SpecificRecord, if the protocol contains messages then an RPC trait is generated (instead of generating and ADT, or ignoring the message definitions).

Doc Support:

.avdl: Comments that begin with /** are used as the documentation string for the type or field definition that follows the comment.
.avsc, .avpr, and .avro: Docs in Avro schemas are used to define a case class' ScalaDoc
.scala: ScalaDocs of case class definitions are used to define record and field docs

Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).

Usage

Library For Scala 2.12, 2.13, and 3
Parses Schemas and IDLs with Avro version 1.11
Generates Code Compatible with Scala 2.12, 2.13, 3

`avrohugger-core`

Get the dependency with:

"com.julianpeeters" %% "avrohugger-core" % "2.8.3"

Description:

Instantiate a Generator with Standard or SpecificRecord source formats. Then use

tToFile(input: T, outputDir: String): Unit

tToStrings(input: T): List[String]

where T can be File, Schema, or String.

Example

import avrohugger.Generator
import avrohugger.format.SpecificRecord
import java.io.File

val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"

where an input File can be .avro, .avsc, .avpr, or .avdl,

and where an input String can be the string representation of an Avro schema, protocol, IDL, or a set of case classes that you'd like to have implement SpecificRecordBase.

Customizable Type Mapping:

To reassign Scala types to Avro types, use the following (e.g. for customizing Specific):

import avrohugger.format.SpecificRecord
import avrohugger.types.ScalaVector

val myScalaTypes = Some(SpecificRecord.defaultTypes.copy(array = ScalaVector))
val generator = new Generator(SpecificRecord, avroScalaCustomTypes = myScalaTypes)

record can be assigned to ScalaCaseClass and ScalaCaseClassWithSchema(with schema in a companion object)
array can be assigned to ScalaSeq, ScalaArray, ScalaList, and ScalaVector
enum can be assigned to JavaEnum, ScalaCaseObjectEnum, EnumAsScalaString, and ScalaEnumeration
fixed can be assigned to ScalaCaseClassWrapper and ScalaCaseClassWrapperWithSchema(with schema in a companion object)
union can be assigned to OptionShapelessCoproduct, OptionEitherShapelessCoproduct, or OptionalShapelessCoproduct
int, long, float, double can be assigned to ScalaInt, ScalaLong, ScalaFloat, ScalaDouble
protocol can be assigned to ScalaADT and NoTypeGenerated
decimal can be assigned to e.g. ScalaBigDecimal(Some(BigDecimal.RoundingMode.HALF_EVEN)) and ScalaBigDecimalWithPrecision(None) (via Shapeless Tagged Types)

Specifically for unions:

Field Type ⬇️ / Behaviour ➡️	OptionShapelessCoproduct	OptionEitherShapelessCoproduct	OptionalShapelessCoproduct
`[{"type": "map", "values": "string"}]`	`Map[String, String]`	`Map[String, String]`	`Map[String, String] :+: CNil`
`["null", "double"]`	`Option[Double]`	`Option[Double]`	`Option[Double :+: CNil]`
`["int", "string"]`	`Int :+: String :+: CNil`	`Either[Int, String]`	`Int :+: String :+: CNil`
`["null", "int", "string"]`	`Option[Int :+: String :+: CNil]`	`Option[Either[Int, String]]`	`Option[Int :+: String :+: CNil]`
`["boolean", "int", "string"]`	`Boolean :+: Int :+: String :+: CNil`	`Boolean :+: Int :+: String :+: CNil`	`Boolean :+: Int :+: String :+: CNil`
`["null", "boolean", "int", "string"]`	`Option[Boolean :+: Int :+: String :+: CNil]`	`Option[Boolean :+: Int :+: String :+: CNil]`	`Option[Boolean :+: Int :+: String :+: CNil]`

Customizable Namespace Mapping:

Namespaces can be reassigned by instantiating a Generator with a custom namespace map:

val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))

Note: Namespace mappings work for with KafkaAvroSerializer but not for KafkaAvroDeserializer; if anyone knows how to configure the deserializer to map incoming schema names to target class names please speak up!

Wildcarding the beginning of a namespace is permitted, place a single asterisk after the prefix that you want to map and any matching schema will have its namespace rewritten. Multiple conflicting wildcards are not permitted.

val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("example.*"->"example.newnamespace"))

`avrohugger-filesorter`

Get the dependency with:

"com.julianpeeters" %% "avrohugger-filesorter" % "2.8.3"

Description:

To ensure dependent schemas are compiled in the proper order (thus avoiding org.apache.avro.SchemaParseException: Undefined name: "com.example.MyRecord" parser errors), sort avsc and avdl files with the sortSchemaFiles method on AvscFileSorter and AvdlFileSorterrespectively.

Example:

import avrohugger.filesorter.AvscFileSorter
import java.io.File

val sorted: List[File] = AvscFileSorter.sortSchemaFiles((srcDir ** "*.avsc"))

`avrohugger-tools`

Download the avrohugger-tools jar for Scala 2.12, or Scala 2.13 (>30MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir:

generate generates Scala case class definitions:

java -jar /path/to/avrohugger-tools_2.12-2.8.3-assembly.jar generate schema user.avsc .

generate-specific generates definitions that extend Avro's SpecificRecordBase:

java -jar /path/to/avrohugger-tools_2.12-2.8.3-assembly.jar generate-specific schema user.avsc .

Warnings

1) If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (e.g. val sdw = SpecificDatumWriter[MyRecord](schema)).

2) For the SpecificRecord format, generated case class fields must be mutable (var) in order to be compatible with the SpecificRecord API. Note: If your framework allows GenericRecord, avro4s provides a type class that converts to and from immutable case classes cleanly.

3) SpecificRecord requires that enum be represented as JavaEnum

Testing

To test for regressions, please run sbt:avrohugger> + test.

To test that generated code can be de/serialized as expected, please run: 1) sbt:avrohugger> + publishLocal 2) then clone sbt-avrohugger and update its avrohugger dependency to the locally published version 3) finally run sbt:sbt-avrohugger> scripted avrohugger/*, or, e.g., scripted avrohugger/GenericSerializationTests

Credits

Depends on Avro and Treehugger. avrohugger-tools is based on avro-tools.

Contributors:


Marius Soutier Brian London alancnet Matt Coffin Ryan Koval Simonas Gelazevicius Paul Snively Marco Stefani Andrew Gustafson Kostya Golikov Plínio Pantaleão Sietse de Kaper Martin Mauch Leon Poon	Paul Pearcy Matt Allen C-zito Tim Chan Saket Daniel Davis Zach Cox Diego E. Alonso Blas Fede Fernández Rob Landers Simon Petty Andreas Drobisch Timo Schmid Dmytro Orlov	Stefano Galarraga Lars Albertsson Eugene Platonov Jerome Wacongne Jon Morra Raúl Raja Martínez Kaur Matas Chris Albright Francisco Díaz Bobby Rauchenberg Leonard Ehrenfried François Sarradin niqdev rsitze-mmai	Julien BENOIT Adam Drakeford Carlos Silva ismail Benammar mcenkar Luca Tronchin LydiaSkuse Algimantas Milašius Leonard Ehrenfried Massimo Siani Konstantin natefitzgerald Victor steve-e

julianpeeters / avrohugger

readme

avrohugger

Table of contents

Generates Scala case classes in various formats:

Supports generating case classes with arbitrary fields of the following datatypes:

Logical Types Support:

Protocol Support:

Doc Support:

Usage

`avrohugger-core`

Get the dependency with:

Description:

Example

Customizable Type Mapping:

Customizable Namespace Mapping:

`avrohugger-filesorter`

Get the dependency with:

Description:

Example:

`avrohugger-tools`

Warnings

Testing

Credits

Criticism is appreciated.

Fork away, just make sure the tests pass before sending a pull request.