linkedin / goavro

Apache License 2.0
971 stars 216 forks source link

Make TextualFromNative output deterministic #254

Open galion96 opened 2 years ago

galion96 commented 2 years ago

This is my take on resolving the #222. I used a simple approach of extracting the keys and sorting them in a deterministic way. Hopefully this is ok or can add some perspective to this issue

xmcqueen commented 2 years ago

Hi and thanks! I'm worried about this having a perf impact. goavro has been very focused on perf. Go has a decent framework for running benchmarks. Would you please include a benchmark test with this PR as in this PR #223

galion96 commented 2 years ago

Understood, will do as soon as I find available time

galion96 commented 2 years ago

On my branch

goavro % go test -bench=. -benchtime=100x
goos: darwin
goarch: amd64
pkg: github.com/linkedin/goavro/v2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkNewCodecUsingV2-16                          100             20088 ns/op
BenchmarkNativeFromAvroUsingV2-16                    100           3460976 ns/op
BenchmarkBinaryFromNativeUsingV2-16                  100           1127675 ns/op
BenchmarkNativeFromBinaryUsingV2-16                  100           3291675 ns/op
BenchmarkTextualFromNativeUsingV2-16                 100          10312088 ns/op
BenchmarkNativeFromTextualUsingV2-16                 100           7148802 ns/op
BenchmarkMapDeterministic/TextualFromNative_deterministic_map_-16                    100               391.8 ns/op

On master

go test -bench=. -benchtime=100x
goos: darwin
goarch: amd64
pkg: github.com/linkedin/goavro/v2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkNewCodecUsingV2-16                          100             19934 ns/op
BenchmarkNativeFromAvroUsingV2-16                    100           3365947 ns/op
BenchmarkBinaryFromNativeUsingV2-16                  100           1146937 ns/op
BenchmarkNativeFromBinaryUsingV2-16                  100           3351293 ns/op
BenchmarkTextualFromNativeUsingV2-16                 100           6815802 ns/op
BenchmarkNativeFromTextualUsingV2-16                 100           7874666 ns/op
{"f1":{"k1":3.5}}
BenchmarkMapDeterministic/TextualFromNative_deterministic_map_-16               {"f1":{"k1":3.5}}
     100               403.8 ns/op

@xmcqueen What did you had in mind, do you want me to add some really large map with 1M records and compare or?

ckarmann commented 1 year ago

Another way to make it deterministic that would maybe be faster than sorting the keys would be to keep the order of the fields in the records in the schema. It would be consistent with what produce Avro library for Java and also with avro-tools.

Here I think the function should should use nameFromIndex from the Record codec to loop through the names instead of using "range destMap" and that would solve the problem for records.

For map, of course there is no definitive order. The Java implementation uses LinkedHashMap to preserve insertion order. Maybe this project could switch to something similar.