Memory leak in new grpc integration [C++, Ubuntu, grpc, master]

cberner commented 7 years ago

There appears to be a memory leak in the GRPC integration. I've reduced it down to a small example, in this repo: https://github.com/cberner/flatbuffer_leak

You can reproduce with the following steps:

git clone https://github.com/cberner/flatbuffer_leak
cd flatbuffer_leak && docker build .
docker run --rm

Running it produces the following error (and the process quickly is OOM killed):

=================================================================
==15==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 668928 byte(s) in 2613 object(s) allocated from:
    #0 0x7fd4209f8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x7fd4203a07ae in gpr_malloc (/usr/local/lib/libgrpc.so.4+0xc07ae)

Indirect leak of 1046201221 byte(s) in 1574 object(s) allocated from:
    #0 0x7fd4209f8602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x7fd4203a07ae in gpr_malloc (/usr/local/lib/libgrpc.so.4+0xc07ae)

Indirect leak of 648 byte(s) in 1 object(s) allocated from:
    #0 0x7fd4209f8961 in realloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98961)
    #1 0x7fd4203a0853 in gpr_realloc (/usr/local/lib/libgrpc.so.4+0xc0853)

SUMMARY: AddressSanitizer: 1046870797 byte(s) leaked in 4188 allocation(s).
./run.sh: line 5:     7 Killed                  ./parameter_server_fb

llchan commented 7 years ago

I'll take a look, but my first pass is hitting some speed bumps: no docker on my box, and gcc-4.8 means i cant use -fsanitize=leak, but I see some probably-related leaks using valgrind memcheck. Have you tried the same example using protobufs instead of flatbuffers?

aardappel commented 7 years ago

@llchan should work without Docker?

llchan commented 7 years ago

Yeah, I have it building with the Makefile + local mods, but I'm on gcc-4.8 so I have to use valgrind instead of the gcc one.

llchan commented 7 years ago

Cross referencing the original serialization traits in grpc, looks like we may be transferred ownership of the grpc_byte_buffer *buffer, rather than receiving it as a borrowed ref. This means we need to destroy it at the end of the deserialize func. Let me make some changes and see if that fixes things.

cberner commented 7 years ago

@llchan, yep my service was originally protobufs and I'm trying to port it to flatbuffers to improve throughput. The original protobufs version works fine

llchan commented 7 years ago

Was able to get docker set up on my box and could reproduce the memory leak in your example. I think the PR above fixes it, could you give it a go?

llchan commented 7 years ago

Btw, not sure how close this code is to your actual benchmark, but a few things to note:

If you have a good guess of the size of the message, you can save on some allocations by setting the MessageBuilder initial size. Protobufs (I think) can write to segmented buffers, so appends are relatively cheap, whereas flatbuffers require a single contiguous buffer, so appends can cause fairly large reallocs/memcpys. I tried a MessageBuilder builder(sizeof(float) * NUM_VALUES + 1024), but you can probably figure out a closer bound.
You can also builder.CreateVector(data->values()->data(), NUM_VALUES) and skip the copy to parameters, though maybe you actually need to copy and this is just an artifact of the stripped down example.

With those two changes I see a ~2.2x increase in throughput.

cberner commented 7 years ago

Cool, thanks for the tips! Ya, that second one is just an artifact of removing a lot of code, but sizing the MessageBuilder would probably give a good speed up in my real test

aardappel commented 7 years ago

@berner it be great to hear some anecdotal numbers on what kind of speedup you're getting, if any (here's hoping the rest of gRPC does not become a bottleneck :)

@llchan now with this bug fixed, I'll do a 1.7.0 release of FlatBuffers soon, that will include this functionality, so we can get more people using this code. We may post about it.

cberner commented 7 years ago

@aardappel so far it looks like ~2x, although I'm still investigating performance bottlenecks. It seems like I should be able to get another 5x, as I'm still no where close to saturating the NIC on my machines

aardappel commented 7 years ago

@cberner That's a pretty exciting speedup already! Please keep us updated if you can.

cberner commented 7 years ago

@aardappel found the first of my issues, and have filed a ticket here: https://github.com/google/flatbuffers/issues/4354

google / flatbuffers

Memory leak in new grpc integration [C++, Ubuntu, grpc, master] #4348