sksamuel / avro4s

Avro schema generation and serialization / deserialization for Scala
Apache License 2.0
719 stars 238 forks source link

Json output captured in BigDecimalOutputStreamTest does not contain correct BigDecimal values #787

Open codesurf42 opened 1 year ago

codesurf42 commented 1 year ago

There is an issue while serializing BigDecimal values to json.

An easy way to capture it is to modify def writeRead[T: Encoder : SchemaFor](t: T)(fn: GenericRecord => Any): Unit = { in OutputStreamTest.scala

so we can capture produced json output:

    {
      val out = writeJson(t)
      println(s"OUT-J : $out")
      val record = readJson(out)
      fn(record)
    }

With such modifications, we will see such outputs: for Test(4.12): OUT-J : {"z":"\u0001œ"}

Test(1234.56) {"z":"\u0001â@"}

Test(List(5.63, 17.92)) {"z":["\u00023","\u0007\u0000"]}

However, as generated json decoders also consume such format and there is no test for the produced json output, the tests using writeRead() are passing ok.

An explicit way to reproduce it looks like this:

  test("BigDecimal - json") {
    type MyType = BigDecimal
    val avroSchema = AvroSchema[MyType]
    val baos = new ByteArrayOutputStream()
    val os = AvroOutputStream
      .json[MyType]
      .to(baos)
      .build()

    val serializedValue = BigDecimal("15.12")
    os.write(Seq(serializedValue))
    os.flush()
    os.close()

    println("BAOS: " + baos.toString)

    // deserialize:
    val bais = new ByteArrayInputStream(baos.toByteArray)
    println(s"bais: ${baos.toString}")
    val is = AvroInputStream.json[MyType].from(bais).build(avroSchema)
    val deserialized = is.iterator.toSet
    is.close()

    println("BAIS: " + deserialized)
    assertEquals(deserialized, Set(serializedValue))
  }

this is producing

BAOS: "\u0005è"
bais: "\u0005è"
BAIS: Set(15.12)

instead of expected values (for smaller values it can produce "n" or "p" as a value)

It seems like json encoders cannot produce correct value for BigDecimals or there is some issue with the encoding or bytes order(?) when reading/encoding BigDecimals values.

I wanted to check if this is somehow impacted by json4s.native that is used for json under the hood, however json4s can produce correct results for BigDecimals.

I would be happy for any help or direction where to look further to find the cause of this issue.

I was checking these issues for the 2.13 branch (4.0.x branch)

Many thanks!