Closed reachfh closed 6 years ago
I made a simple project with my fingerprint code: https://github.com/cogini/avro_fingerprint It's using mix to build (I mainly write Elixir these days). This runs the test cases from the Avro Java code. erlavro gives an error on test 016, "union should have at least one member type", so it's commented out. Otherwise it shows canonicalization differences.
we would very much like to receive contribution, thanks in advance.
I will add a avro:encode_schema/2
to accept an Opt
to produce canonical form.
some notes: To produce canonical form, there are more work to do than fixing the JSON field orders.
Great. I will make a separate pull request with my avro_fingerprint code. I will try to make the changes for canonical form and make a pull request. Let me know how you would like it to work. My primary goal is that non-functional differences like whitespace and comments don't affect the hash, but of course it should be done properly for compatibility.
I would like to have it fully compatible with spec for canonical form. Yes, PR is welcome. When do you think you'll be able send the PR for canonical form? (it doesn't have to be fully implemented before sending the pr, I like early involvement).
I will probably do it in the next day or two.
fixed in #64
I am using erlavro to generate Kafka messages, and am generating a schema fingerprint as described in https://avro.apache.org/docs/1.8.2/spec.html#schema_fingerprints (I have written an Erlang implementation of the Rabin fingerprint which I could contribute to erlavro, if you would like it.)
I want to generate the fingerprint based on the Parsing Canonical Form as described in https://avro.apache.org/docs/1.8.2/spec.html#Transforming+into+Parsing+Canonical+Form I am trying to do this by running
avro_json_encoder:encode_schema outputs fields in a different order for records, though. It should be:
"[ORDER] Order the appearance of fields of JSON objects as follows: name, type, fields, symbols, items, values, size. For example, if an object has type, name, and size fields, then the name field should appear first, followed by the type and then the size fields."