julianpeeters / sbt-avrohugger

sbt plugin for generating Scala sources for Apache Avro schemas and protocols.
Apache License 2.0
133 stars 50 forks source link

Add the tests for the logical types in specific records. #54

Closed inafets closed 6 years ago

inafets commented 6 years ago

Currently some of the tests do not pass for two reasons:

  1. apparently Avro is not able to decode unions with a logical types, unless the value is null
  2. the comparison in the test fails with this error: java.lang.ClassCastException: scala.math.BigDecimal cannot be cast to java.lang.Comparable (GenericData.java:998)

I have removed the first cause from the tests for the moment. I need advice on how to proceed.

julianpeeters commented 6 years ago

Great progress, @inafets. I've only had time to look into (2), and I can reproduce the error. So I have only a hypothesis: the fact that the failure is from GenericData, when we're using the Specific api, suggests that there was a schema mismatch somewhere and so the data "fell through" to avro's default. This often happens when using the Specific api in Scala because it's java of course, and is full of reflective calls, and since Scala's fields are private, avro can't reflect the classes properly to find the schema. Perhaps that's what's happening here?: https://github.com/julianpeeters/avrohugger/pull/88/files#diff-fa7f84bf817786b715229d53269c8a01R16

Avro provides a workaround in some cases, allowing us to pass in a schema to preempt reflection, but I haven't had a chance to look into whether or not that will work for us here.

Regarding (1), does that mean it's been tested with java generated classes and the specific api? I also wonder if a solution could be to wrap the logical type in a record if one wished to use it in a union, and we could attempt to make that caveat clear in the README. In that worse case, would that suffice for your uses?

Hopefully I'll have a chance to look at (1) and further into (2) this weekend

inafets commented 6 years ago

After a few more tests with generated Java classes, I would say that union with logical types does not work in general. There is this https://issues.apache.org/jira/browse/AVRO-2065 and in any case, even adding the converter doesn't help, because in the Avro code the logicalType is not detected correctly in case of unions. About the problem with BigDecimal, in the generated Java class java.math.BigDecimal is used. I am not sure how to test further this.

julianpeeters commented 6 years ago

Agreed about unions, let's punt on them.

I think I have something that gets us further along. What do you think of the work in progress (below)? I only got as far as BigDecimal before stumbling on another error for LocalDateTime, but the principle should be the same: if we convert in the get and put, then there's no need to override so much extra SpecificRecordBase etc. , nor any new dependency on avrohugger.

What I have below seems to work for BigDecimal, tho it may need some more editying. I'd usually put such conversions in the existing converters package. Also, your comment about java.math.BigDecimal makes me think that I may overlooked a simpler way (doesn't Scala BigDecimal have a constructor that takes a Java BigDecimal)?

If you don't beat me to it, I think I'll have some time to look into it this weekend

/** MACHINE-GENERATED FROM AVRO SCHEMA. DO NOT EDIT DIRECTLY */
package test

import scala.annotation.switch

case class LogicalSc(var data: BigDecimal, var ts: java.time.LocalDateTime, var dt: java.time.LocalDate) extends org.apache.avro.specific.SpecificRecordBase {
  def this() = this(scala.math.BigDecimal(0), java.time.LocalDateTime.now, java.time.LocalDate.now)
  // private val conversions: List[org.apache.avro.Conversion[_]] = List(avrohugger.format.specific.conversions.DecimalConversion, avrohugger.format.specific.conversions.DateTimeConversion, avrohugger.format.specific.conversions.DateConversion)
  // override def getConversion(field$: Int): org.apache.avro.Conversion[_] = {
  //   conversions(field$)
  // }
  // private val encoder = new org.apache.avro.message.BinaryMessageEncoder[LogicalSc](LogicalSc.MODEL$, getSchema)
  // def toByteBuffer: java.nio.ByteBuffer = {
  //   encoder.encode(this)
  // }
  // private val WRITER$ = LogicalSc.MODEL$.createDatumWriter(getSchema).asInstanceOf[org.apache.avro.io.DatumWriter[LogicalSc]]
  // override def writeExternal(out: java.io.ObjectOutput) {
  //   WRITER$.write(this, org.apache.avro.specific.SpecificData.getEncoder(out))
  // }
  // private val READER$ = LogicalSc.MODEL$.createDatumReader(getSchema).asInstanceOf[org.apache.avro.io.DatumReader[LogicalSc]]
  // override def readExternal(in: java.io.ObjectInput) {
  //   READER$.read(this, org.apache.avro.specific.SpecificData.getDecoder(in))
  // }
  def get(field$: Int): AnyRef = {
    (field$: @switch) match {
      case 0 => {
        val decimalType = getSchema.getFields().get(field$).schema().getLogicalType().asInstanceOf[org.apache.avro.LogicalTypes.Decimal]
        val scale = decimalType.getScale()
        val scaledValue = data.setScale(scale)
        LogicalSc.decimalConversion.toBytes(scaledValue.bigDecimal, null, decimalType)
      }.asInstanceOf[AnyRef]
      case 1 => {
        ts
      }.asInstanceOf[AnyRef]
      case 2 => {
        dt
      }.asInstanceOf[AnyRef]
      case _ => new org.apache.avro.AvroRuntimeException("Bad index")
    }
  }
  def put(field$: Int, value: Any): Unit = {
    (field$: @switch) match {
      case 0 => this.data = {
        value match {
          case (buffer: java.nio.ByteBuffer) => {
            val decimalType = getSchema.getFields().get(field$).schema().getLogicalType()
            LogicalSc.decimalConversion.fromBytes(buffer, null, decimalType)
          }
        }
      }.asInstanceOf[BigDecimal]
      case 1 => this.ts = {
        value
      }.asInstanceOf[java.time.LocalDateTime]
      case 2 => this.dt = {
        value
      }.asInstanceOf[java.time.LocalDate]
      case _ => new org.apache.avro.AvroRuntimeException("Bad index")
    }
    ()
  }
  def getSchema: org.apache.avro.Schema = LogicalSc.SCHEMA$
}

object LogicalSc {
  val SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"LogicalSc\",\"namespace\":\"test\",\"fields\":[{\"name\":\"data\",\"type\":{\"type\":\"bytes\",\"logicalType\":\"decimal\",\"precision\":20,\"scale\":8}},{\"name\":\"ts\",\"type\":{\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}},{\"name\":\"dt\",\"type\":{\"type\":\"int\",\"logicalType\":\"date\"}}]}")
  val decimalConversion = new org.apache.avro.Conversions.DecimalConversion
  // private val MODEL$ = new org.apache.avro.specific.SpecificData
  // private val decoder = new org.apache.avro.message.BinaryMessageDecoder[LogicalSc](MODEL$, SCHEMA$)
  // def getDecoder: org.apache.avro.message.BinaryMessageDecoder[LogicalSc] = {
  //   decoder
  // }
  // def fromByteBuffer(b: java.nio.ByteBuffer): LogicalSc = {
  //   decoder.decode(b)
  // }
}