sksamuel / avro4s

Avro schema generation and serialization / deserialization for Scala
Apache License 2.0
714 stars 236 forks source link

fix: avro enum sorting by symbol position (as is) not alphabetical #804

Closed ThijsBroersen closed 8 months ago

ThijsBroersen commented 8 months ago

This PR addresses an issue referenced in #769

Avro4s wrongfully sorts enum type symbols instead of using in the order they are defined in a schema. AVRO spec states sorting is based on symbol position: https://avro.apache.org/docs/1.11.1/specification/#sort-order

In AvroNameSchemaTest there was wrong testdata. Benelux had it's subtypes in a different order as in avro_name_sealed_trait_symbol.json The test succeeded because it accidently was sorted in the correct order as List("foofoo", "Luxembourg").sorted == List("Luxembourg", "foofoo"), as uppercase chars precede lowercase chars.

To take into account priorities I added EnumOrdering, because SubtypeOrdering takes into account the type names.

ThijsBroersen commented 8 months ago

I think even more is broken. @AvroSortPriority should be used to sort in descending order. All those tests are broken now, as the priority in the test data does not match the avsc sources

ThijsBroersen commented 8 months ago

I changed the tests, but if this is all correct then everyone using this 5.x lib has probably wrongly encoded/decoded data, correct?