Closed cberner closed 7 years ago
I'll take a look, but my first pass is hitting some speed bumps: no docker on my box, and gcc-4.8 means i cant use -fsanitize=leak
, but I see some probably-related leaks using valgrind memcheck. Have you tried the same example using protobufs instead of flatbuffers?
@llchan should work without Docker?
Yeah, I have it building with the Makefile + local mods, but I'm on gcc-4.8 so I have to use valgrind instead of the gcc one.
Cross referencing the original serialization traits in grpc, looks like we may be transferred ownership of the grpc_byte_buffer *buffer
, rather than receiving it as a borrowed ref. This means we need to destroy it at the end of the deserialize func. Let me make some changes and see if that fixes things.
@llchan, yep my service was originally protobufs and I'm trying to port it to flatbuffers to improve throughput. The original protobufs version works fine
Was able to get docker set up on my box and could reproduce the memory leak in your example. I think the PR above fixes it, could you give it a go?
Btw, not sure how close this code is to your actual benchmark, but a few things to note:
MessageBuilder
initial size. Protobufs (I think) can write to segmented buffers, so appends are relatively cheap, whereas flatbuffers require a single contiguous buffer, so appends can cause fairly large reallocs/memcpys. I tried a MessageBuilder builder(sizeof(float) * NUM_VALUES + 1024)
, but you can probably figure out a closer bound.builder.CreateVector(data->values()->data(), NUM_VALUES)
and skip the copy to parameters
, though maybe you actually need to copy and this is just an artifact of the stripped down example.With those two changes I see a ~2.2x increase in throughput.
Cool, thanks for the tips! Ya, that second one is just an artifact of removing a lot of code, but sizing the MessageBuilder
would probably give a good speed up in my real test
@berner it be great to hear some anecdotal numbers on what kind of speedup you're getting, if any (here's hoping the rest of gRPC does not become a bottleneck :)
@llchan now with this bug fixed, I'll do a 1.7.0 release of FlatBuffers soon, that will include this functionality, so we can get more people using this code. We may post about it.
@aardappel so far it looks like ~2x, although I'm still investigating performance bottlenecks. It seems like I should be able to get another 5x, as I'm still no where close to saturating the NIC on my machines
@cberner That's a pretty exciting speedup already! Please keep us updated if you can.
@aardappel found the first of my issues, and have filed a ticket here: https://github.com/google/flatbuffers/issues/4354
There appears to be a memory leak in the GRPC integration. I've reduced it down to a small example, in this repo: https://github.com/cberner/flatbuffer_leak
You can reproduce with the following steps:
Running it produces the following error (and the process quickly is OOM killed):