hamba / avro

A fast Go Avro codec
MIT License
387 stars 95 forks source link

Potential Bug - Adding default value of null doesn't change schema encoder hash #449

Open TheGreatAbyss opened 1 month ago

TheGreatAbyss commented 1 month ago

Hello, and once again thank you so much for this library and your support of it.

I ran into what may or may not be a bug depending on what is considered a "valid" schema.

I created my initial subject with a union field in the schema, but by mistake I forgot to specify a default value:

...
{"type": ["null", "string"], "logicalType": "UUID", "name": "publication_id"},
...

When a map[interface] value is passed to the encoder but does not include a publication_id it will appropriately throw an error.

Later I realized my "mistake", and I wanted to change the application behavior so it sets a Null value for publication_id instead of throwing an error. I updated the subject and set a new schema_id with the below schema:

...
{"type": ["null", "string"], "logicalType": "UUID", "name": "publication_id", "default": None},
...

My application runs a background thread that calls GetLatestSchemaInfo every two minutes. To my surprise after the schema was updated I was still getting the avro encoding error about missing publication_id. I banged my head for a while trying to figure out what was wrong with my code, and then finally realized that there is an internal cache for the schema encoder.

The problem is that both schemas posted above hash to the same value. If my initial schema was as below it would have worked as expected as this hashes to something else:

...
{"type": "string", "logicalType": "UUID", "name": "publication_id"},
...

Is this a bug? Is there no valid use case for specifying a union of null and string without setting a default?

Thank You!

nrwiersma commented 1 month ago

Technically speaking, the default does not apply to the encoder, and is used only as a convenience in this library. By spec, the default does not apply to writes, only read compatibility. This is why default is not included in the write hash.

From the spec:

default: A default value for this field, only used when reading instances that lack the field for schema evolution purposes. The presence of a default value does not make the field optional at encoding time.

TheGreatAbyss commented 1 month ago

Thanks for your response.

"By spec, the default does not apply to writes, only read compatibility."

I guess I'm a bit confused because it does effect how the library behaves on write. Without default an error is thrown, with default it writes "publication_id":null,

nrwiersma commented 1 month ago

This is the convenience mechanism I mentioned. While not mandated by spec, the library will use the default for writes if it exists.

TheGreatAbyss commented 1 month ago

It's not a major issue but if the library uses the default if it exists, then maybe it should be taken into account in the cache if the default changes (or is added in this case) as it does effect the behavior?