hortonworks / registry

Schema Registry
Apache License 2.0
13 stars 8 forks source link

Schema fingerprinting handles union ordering as distinctive feature #801

Open balassai opened 1 year ago

balassai commented 1 year ago

In Avro, union types can have any number of types. The first type has a distinct meaning, as that one provides the default value of the field (e.g. this is why optional fields are represented as ["null", "something"]).

Fingerprinting generates different results when the union order is different, while in reality, it should only care about the first item, and then the rest of the types can be in any order.

E.g.

"null", "string", "int" and "int", "null", "string" are different "null", "string", "int" and "null", "int", "string" are practically the same

IMPORTANT NOTE: fingerprints are stored in the database, so backward compatibility must be addressed if changing this logic.