Closed inafets closed 6 years ago
Great progress, @inafets. I've only had time to look into (2), and I can reproduce the error. So I have only a hypothesis: the fact that the failure is from GenericData
, when we're using the Specific api, suggests that there was a schema mismatch somewhere and so the data "fell through" to avro's default. This often happens when using the Specific api in Scala because it's java of course, and is full of reflective calls, and since Scala's fields are private, avro can't reflect the classes properly to find the schema. Perhaps that's what's happening here?: https://github.com/julianpeeters/avrohugger/pull/88/files#diff-fa7f84bf817786b715229d53269c8a01R16
Avro provides a workaround in some cases, allowing us to pass in a schema to preempt reflection, but I haven't had a chance to look into whether or not that will work for us here.
Regarding (1), does that mean it's been tested with java generated classes and the specific api? I also wonder if a solution could be to wrap the logical type in a record if one wished to use it in a union, and we could attempt to make that caveat clear in the README. In that worse case, would that suffice for your uses?
Hopefully I'll have a chance to look at (1) and further into (2) this weekend
After a few more tests with generated Java classes, I would say that union with logical types does not work in general. There is this https://issues.apache.org/jira/browse/AVRO-2065 and in any case, even adding the converter doesn't help, because in the Avro code the logicalType is not detected correctly in case of unions. About the problem with BigDecimal, in the generated Java class java.math.BigDecimal is used. I am not sure how to test further this.
Agreed about unions, let's punt on them.
I think I have something that gets us further along. What do you think of the work in progress (below)? I only got as far as BigDecimal
before stumbling on another error for LocalDateTime
, but the principle should be the same: if we convert in the get
and put
, then there's no need to override so much extra SpecificRecordBase etc. , nor any new dependency on avrohugger.
What I have below seems to work for BigDecimal, tho it may need some more editying. I'd usually put such conversions in the existing converters
package. Also, your comment about java.math.BigDecimal
makes me think that I may overlooked a simpler way (doesn't Scala BigDecimal have a constructor that takes a Java BigDecimal)?
If you don't beat me to it, I think I'll have some time to look into it this weekend
/** MACHINE-GENERATED FROM AVRO SCHEMA. DO NOT EDIT DIRECTLY */
package test
import scala.annotation.switch
case class LogicalSc(var data: BigDecimal, var ts: java.time.LocalDateTime, var dt: java.time.LocalDate) extends org.apache.avro.specific.SpecificRecordBase {
def this() = this(scala.math.BigDecimal(0), java.time.LocalDateTime.now, java.time.LocalDate.now)
// private val conversions: List[org.apache.avro.Conversion[_]] = List(avrohugger.format.specific.conversions.DecimalConversion, avrohugger.format.specific.conversions.DateTimeConversion, avrohugger.format.specific.conversions.DateConversion)
// override def getConversion(field$: Int): org.apache.avro.Conversion[_] = {
// conversions(field$)
// }
// private val encoder = new org.apache.avro.message.BinaryMessageEncoder[LogicalSc](LogicalSc.MODEL$, getSchema)
// def toByteBuffer: java.nio.ByteBuffer = {
// encoder.encode(this)
// }
// private val WRITER$ = LogicalSc.MODEL$.createDatumWriter(getSchema).asInstanceOf[org.apache.avro.io.DatumWriter[LogicalSc]]
// override def writeExternal(out: java.io.ObjectOutput) {
// WRITER$.write(this, org.apache.avro.specific.SpecificData.getEncoder(out))
// }
// private val READER$ = LogicalSc.MODEL$.createDatumReader(getSchema).asInstanceOf[org.apache.avro.io.DatumReader[LogicalSc]]
// override def readExternal(in: java.io.ObjectInput) {
// READER$.read(this, org.apache.avro.specific.SpecificData.getDecoder(in))
// }
def get(field$: Int): AnyRef = {
(field$: @switch) match {
case 0 => {
val decimalType = getSchema.getFields().get(field$).schema().getLogicalType().asInstanceOf[org.apache.avro.LogicalTypes.Decimal]
val scale = decimalType.getScale()
val scaledValue = data.setScale(scale)
LogicalSc.decimalConversion.toBytes(scaledValue.bigDecimal, null, decimalType)
}.asInstanceOf[AnyRef]
case 1 => {
ts
}.asInstanceOf[AnyRef]
case 2 => {
dt
}.asInstanceOf[AnyRef]
case _ => new org.apache.avro.AvroRuntimeException("Bad index")
}
}
def put(field$: Int, value: Any): Unit = {
(field$: @switch) match {
case 0 => this.data = {
value match {
case (buffer: java.nio.ByteBuffer) => {
val decimalType = getSchema.getFields().get(field$).schema().getLogicalType()
LogicalSc.decimalConversion.fromBytes(buffer, null, decimalType)
}
}
}.asInstanceOf[BigDecimal]
case 1 => this.ts = {
value
}.asInstanceOf[java.time.LocalDateTime]
case 2 => this.dt = {
value
}.asInstanceOf[java.time.LocalDate]
case _ => new org.apache.avro.AvroRuntimeException("Bad index")
}
()
}
def getSchema: org.apache.avro.Schema = LogicalSc.SCHEMA$
}
object LogicalSc {
val SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"LogicalSc\",\"namespace\":\"test\",\"fields\":[{\"name\":\"data\",\"type\":{\"type\":\"bytes\",\"logicalType\":\"decimal\",\"precision\":20,\"scale\":8}},{\"name\":\"ts\",\"type\":{\"type\":\"long\",\"logicalType\":\"timestamp-millis\"}},{\"name\":\"dt\",\"type\":{\"type\":\"int\",\"logicalType\":\"date\"}}]}")
val decimalConversion = new org.apache.avro.Conversions.DecimalConversion
// private val MODEL$ = new org.apache.avro.specific.SpecificData
// private val decoder = new org.apache.avro.message.BinaryMessageDecoder[LogicalSc](MODEL$, SCHEMA$)
// def getDecoder: org.apache.avro.message.BinaryMessageDecoder[LogicalSc] = {
// decoder
// }
// def fromByteBuffer(b: java.nio.ByteBuffer): LogicalSc = {
// decoder.decode(b)
// }
}
Currently some of the tests do not pass for two reasons:
I have removed the first cause from the tests for the moment. I need advice on how to proceed.