hamba / avro

A fast Go Avro codec
MIT License
353 stars 82 forks source link

Naming rules inconsistencies #414

Open papanikge opened 1 week ago

papanikge commented 1 week ago

Hello all,

Here's the situation. Apache Avro spec has some naming guidelines for Record, enums and fixed. This library correctly checks that and fails when a file has a badly named field.

However, the official Avro library does not 😢 avro-tools tojson <badly-named-file.ocf> happily reads the file (same goes for AWS - I think that's because they use Java)

There is a discussion about it here in the mailing list, and furthermore Python seems to have chosen to allow for disabling this validation

Because of that last note, I think we should consider adding this optional functionality in this library too. What do you think? I can do a PR if you're on board.

nrwiersma commented 1 week ago

Well, I will not pretend to like this. This is some serious ass backward stuff. Asking other libraries to cripple themselves cause other people cannot read a spec or make a decision is really not great.

That said, if it is turned off by default and is documented as counter spec, and does not add too much complexity, it would be accepted.

papanikge commented 2 days ago

Opened https://github.com/hamba/avro/pull/415 Unfortunately I had to do property-drilling for that boolean configuration :/ Also I had to add the WithSkipNameValidation func in two spots, both in ocf and in schema file in the avro package.

Feel free to roast it and suggest any alternatives :) I'm not very accustomed to this repo's code yet.

I tested it in our systems using my fork and all our suite's tests seem to pass even for badly named files. However, I would like to add some more tests here too.