planetscale / vtprotobuf

A Protocol Buffers compiler that generates optimized marshaling & unmarshaling Go code for ProtoBuf APIv2
BSD 3-Clause "New" or "Revised" License
868 stars 74 forks source link

Better support for maps #12

Open ericrenard opened 3 years ago

ericrenard commented 3 years ago

Hello, While testing a bit this library in our code base, I noted that map support with pool is not perfect.

Doing these changes manually in the .pb.go reduces a lot allocations and speeds up unmarshalling.

I can provide a sample proto file and modifications made if needed.

Pre changes: BenchmarkUnmarshalStdProto-12 202058 4971 ns/op 3840 B/op 54 allocs/op BenchmarkUnmarshalVTProto-12 228591 5238 ns/op 3429 B/op 47 allocs/op BenchmarkUnmarshalVTProtoWithPool-12 238689 4967 ns/op 2605 B/op 44 allocs/op

Post changes: BenchmarkUnmarshalStdProto-12 203602 5240 ns/op 3840 B/op 54 allocs/op BenchmarkUnmarshalVTProto-12 199917 5864 ns/op 3433 B/op 47 allocs/op BenchmarkUnmarshalVTProtoWithPool-12 601562 2009 ns/op 302 B/op 5 allocs/op

vmg commented 3 years ago

Hi @ericrenard!

I can provide a sample proto file and modifications made if needed.

I would appreciate that. I've been playing around with potential ways to pool the maps but nothing was proving to be significantly faster. Can you share some code?

ericrenard commented 3 years ago

Hi, Sorry for replying so late, I was busy with other stuff :p I uploaded an example here : https://github.com/MolotovTv/benchvitess You have sample_vtproto.pb.go.standard which is the currently generated pb file for included proto and sample_vtproto.pb.go which is my hacked version. Basically what I'm doing is to use the pool when a map value is an object and to pool the map itself (which is maybe not worth it).

Pre changes benchmark: BenchmarkMarshal/marshalStd-12 27 41980015 ns/op 2184862 B/op 200005 allocs/op BenchmarkMarshal/marshalVT-12 87 12711687 ns/op 1384449 B/op 1 allocs/op BenchmarkUnmarshal/unMarshalStd-12 266701 3751 ns/op 1274 B/op 24 allocs/op BenchmarkUnmarshal/unMarshalVT-12 830570 1262 ns/op 1186 B/op 13 allocs/op BenchmarkUnmarshal/unMarshalVTWithPool-12 830074 1295 ns/op 1186 B/op 13 allocs/op

Post changes benchmark: BenchmarkMarshal/marshalStd-12 25 44627198 ns/op 2184793 B/op 200005 allocs/op BenchmarkMarshal/marshalVT-12 84 15979075 ns/op 1384450 B/op 1 allocs/op BenchmarkUnmarshal/unMarshalStd-12 322051 3813 ns/op 1274 B/op 24 allocs/op BenchmarkUnmarshal/unMarshalVT-12 415844 2787 ns/op 1187 B/op 13 allocs/op BenchmarkUnmarshal/unMarshalVTWithPool-12 967172 1215 ns/op 3 B/op 0 allocs/op

I think pooling the values from the map is always better, but obviously the more objects in the map, the most difference it makes, for ex with 1000 keys :

BenchmarkUnmarshal/unMarshalVTWithPool-12 7194 149844 ns/op 147475 B/op 1030 allocs/op BenchmarkUnmarshal/unMarshalVTWithPool-12 9793 118702 ns/op 192 B/op 0 allocs/op

While the difference in terms of speed is not huge in a benchmark, I think it makes a lot of difference on the gc pressure in context of, for example, microservices that do mainly unmarshall/marshal stuff.

dpmabo commented 3 months ago

any updates on this issue?