Schema-to-case-class code generation for working with Avro in Scala.
avrohugger-core
: Generate source code at runtime for evaluation at a later step.avrohugger-filesorter
: Sort schema files for proper compilation order.avrohugger-tools
: Generate source code at the command line with the avrohugger-tools jar.Alternative Distributions:
sbt-avrohugger
- Generate source code at compile time with an sbt plugin.avrohugger-maven-plugin
- Generate source code at compile time with a maven plugin.mill-avro
- Generate source code at compile time with a Mill plugin.gradle-avrohugger-plugin
- Generate source code at compile time with a gradle plugin.mu-scala
- Generate rpc models, messages, clients, and servers.Standard
, SpecificRecord
Standard
Vanilla case classes (for use with Apache Avro's GenericRecord
API, etc.)
SpecificRecord
Case classes that implement SpecificRecordBase
and
therefore have mutable var
fields (for use with the Avro Specific API -
Scalding, Spark, Avro, etc.).
Avro | Standard |
SpecificRecord |
Notes |
---|---|---|---|
INT | Int | Int | See Logical Types: date |
LONG | Long | Long | See Logical Types: timestamp-millis |
FLOAT | Float | Float | |
DOUBLE | Double | Double | |
STRING | String | String | |
BOOLEAN | Boolean | Boolean | |
NULL | Null | Null | |
MAP | Map | Map | |
ENUM | scala.Enumeration Scala case object Java Enum EnumAsScalaString |
Java Enum EnumAsScalaString |
See Customizable Type Mapping |
BYTES | Array[Byte] BigDecimal |
Array[Byte] BigDecimal |
See Logical Types: decimal |
FIXED | case class case class + schema |
case class extending SpecificFixed |
See Logical Types: decimal |
ARRAY | Seq List Array Vector |
Seq List Array Vector |
See Customizable Type Mapping |
UNION | Option Either Shapeless Coproduct |
Option Either Shapeless Coproduct |
See Customizable Type Mapping |
RECORD | case class case class + schema |
case class extending SpecificRecordBase |
See Customizable Type Mapping |
PROTOCOL | No Type Scala ADT |
RPC trait Scala ADT |
See Customizable Type Mapping |
Date | java.time.LocalDate java.sql.Date Int |
java.time.LocalDate java.sql.Date Int |
See Customizable Type Mapping |
TimeMillis | java.time.LocalTime Int |
java.time.LocalTime Int |
See Customizable Type Mapping |
TimeMicros | java.time.LocalTime Long |
java.time.LocalTime Long |
See Customizable Type Mapping |
TimestampMillis | java.time.Instant java.sql.Timestamp Long |
java.time.Instant java.sql.Timestamp Long |
See Customizable Type Mapping |
TimestampMicros | java.time.Instant java.sql.Timestamp Long |
java.time.Instant java.sql.Timestamp Long |
See Customizable Type Mapping |
LocalTimestampMillis | java.time.LocalDateTime Long |
java.time.LocalDateTime Long |
See Customizable Type Mapping |
LocalTimestampMicros | java.time.LocalDateTime Long |
java.time.LocalDateTime Long |
See Customizable Type Mapping |
UUID | java.util.UUID | java.util.UUID | See Customizable Type Mapping |
Decimal | BigDecimal | BigDecimal | See Customizable Type Mapping |
NOTE: Currently logical types are only supported for Standard
and SpecificRecord
formats
date
: Annotates Avro int
schemas to generate java.time.LocalDate
or java.sql.Date
(See Customizable Type Mapping). Examples: avdl, avsc.decimal
: Annotates Avro bytes
and fixed
schemas to generate BigDecimal
. Examples: avdl, avsc.timestamp-millis
: Annotates Avro long
schemas to genarate java.time.Instant
or java.sql.Timestamp
or long
(See Customizable Type Mapping). Examples: avdl, avsc.uuid
: Annotates Avro string
schemas and idls to generate java.util.UUID
(See Customizable Type Mapping). Example: avsc.time-millis
: Annotates Avro int
schemas to genarate java.time.LocalTime
or java.sql.Time
or int
the records defined in .avdl
, .avpr
, and json protocol strings can be generated as ADTs if the protocols define more than one Scala definition (note: message definitions are ignored when this setting is used). See Customizable Type Mapping.
For SpecificRecord
, if the protocol contains messages then an RPC trait is generated (instead of generating and ADT, or ignoring the message definitions).
.avdl
: Comments that begin with /**
are used as the documentation string for the type or field definition that follows the comment.
.avsc
, .avpr
, and .avro
: Docs in Avro schemas are used to define a case class' ScalaDoc
.scala
: ScalaDocs of case class definitions are used to define record and field docs
Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).
avrohugger-core
"com.julianpeeters" %% "avrohugger-core" % "2.8.3"
Instantiate a Generator
with Standard
or SpecificRecord
source formats.
Then use
tToFile(input: T, outputDir: String): Unit
or
tToStrings(input: T): List[String]
where T
can be File
, Schema
, or String
.
import avrohugger.Generator
import avrohugger.format.SpecificRecord
import java.io.File
val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"
where an input File
can be .avro
, .avsc
, .avpr
, or .avdl
,
and where an input String
can be the string representation of an Avro schema,
protocol, IDL, or a set of case classes that you'd like to have implement
SpecificRecordBase
.
To reassign Scala types to Avro types, use the following (e.g. for customizing Specific
):
import avrohugger.format.SpecificRecord
import avrohugger.types.ScalaVector
val myScalaTypes = Some(SpecificRecord.defaultTypes.copy(array = ScalaVector))
val generator = new Generator(SpecificRecord, avroScalaCustomTypes = myScalaTypes)
record
can be assigned to ScalaCaseClass
and ScalaCaseClassWithSchema
(with schema in a companion object)array
can be assigned to ScalaSeq
, ScalaArray
, ScalaList
, and ScalaVector
enum
can be assigned to JavaEnum
, ScalaCaseObjectEnum
, EnumAsScalaString
, and ScalaEnumeration
fixed
can be assigned to ScalaCaseClassWrapper
and ScalaCaseClassWrapperWithSchema
(with schema in a companion object)union
can be assigned to OptionShapelessCoproduct
, OptionEitherShapelessCoproduct
, or OptionalShapelessCoproduct
int
, long
, float
, double
can be assigned to ScalaInt
, ScalaLong
, ScalaFloat
, ScalaDouble
protocol
can be assigned to ScalaADT
and NoTypeGenerated
decimal
can be assigned to e.g. ScalaBigDecimal(Some(BigDecimal.RoundingMode.HALF_EVEN))
and ScalaBigDecimalWithPrecision(None)
(via Shapeless Tagged Types)Specifically for unions:
Field Type ⬇️ / Behaviour ➡️ | OptionShapelessCoproduct | OptionEitherShapelessCoproduct | OptionalShapelessCoproduct |
---|---|---|---|
[{"type": "map", "values": "string"}] |
Map[String, String] |
Map[String, String] |
Map[String, String] :+: CNil |
["null", "double"] |
Option[Double] |
Option[Double] |
Option[Double :+: CNil] |
["int", "string"] |
Int :+: String :+: CNil |
Either[Int, String] |
Int :+: String :+: CNil |
["null", "int", "string"] |
Option[Int :+: String :+: CNil] |
Option[Either[Int, String]] |
Option[Int :+: String :+: CNil] |
["boolean", "int", "string"] |
Boolean :+: Int :+: String :+: CNil |
Boolean :+: Int :+: String :+: CNil |
Boolean :+: Int :+: String :+: CNil |
["null", "boolean", "int", "string"] |
Option[Boolean :+: Int :+: String :+: CNil] |
Option[Boolean :+: Int :+: String :+: CNil] |
Option[Boolean :+: Int :+: String :+: CNil] |
Namespaces can be reassigned by instantiating a Generator
with a custom
namespace map:
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))
Note: Namespace mappings work for with KafkaAvroSerializer but not for KafkaAvroDeserializer; if anyone knows how to configure the deserializer to map incoming schema names to target class names please speak up!
Wildcarding the beginning of a namespace is permitted, place a single asterisk after the prefix that you want to map and any matching schema will have its namespace rewritten. Multiple conflicting wildcards are not permitted.
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("example.*"->"example.newnamespace"))
avrohugger-filesorter
"com.julianpeeters" %% "avrohugger-filesorter" % "2.8.3"
To ensure dependent schemas are compiled in the proper order (thus avoiding org.apache.avro.SchemaParseException: Undefined name: "com.example.MyRecord"
parser errors), sort avsc and avdl files with the sortSchemaFiles
method on AvscFileSorter
and AvdlFileSorter
respectively.
import avrohugger.filesorter.AvscFileSorter
import java.io.File
val sorted: List[File] = AvscFileSorter.sortSchemaFiles((srcDir ** "*.avsc"))
avrohugger-tools
Download the avrohugger-tools jar for Scala 2.12, or Scala 2.13 (>30MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir
:
generate
generates Scala case class definitions:java -jar /path/to/avrohugger-tools_2.12-2.8.3-assembly.jar generate schema user.avsc .
generate-specific
generates definitions that extend Avro's SpecificRecordBase
:java -jar /path/to/avrohugger-tools_2.12-2.8.3-assembly.jar generate-specific schema user.avsc .
1) If your framework is one that relies on reflection to get the Schema, it
will fail since Scala fields are private. Therefore preempt it by passing in
a Schema to DatumReaders and DatumWriters (e.g. val sdw = SpecificDatumWriter[MyRecord](schema)
).
2) For the SpecificRecord
format, generated case class fields must be
mutable (var
) in order to be compatible with the SpecificRecord API. Note:
If your framework allows GenericRecord
, avro4s
provides a type class that converts to and from immutable case classes cleanly.
3) SpecificRecord
requires that enum
be represented as JavaEnum
To test for regressions, please run sbt:avrohugger> + test
.
To test that generated code can be de/serialized as expected, please run:
1) sbt:avrohugger> + publishLocal
2) then clone sbt-avrohugger and update its avrohugger dependency to the locally
published version
3) finally run sbt:sbt-avrohugger> scripted avrohugger/*
, or, e.g., scripted avrohugger/GenericSerializationTests
Depends on Avro and Treehugger. avrohugger-tools
is based on avro-tools.
Contributors: