nanopb / nanopb

Protocol Buffers with small code size
https://jpa.kapsi.fi/nanopb/
zlib License
4.36k stars 860 forks source link

Improve memory allocation performance #571

Open PetteriAimonen opened 4 years ago

PetteriAimonen commented 4 years ago

Currently nanopb relies on realloc() to efficiently increase size of allocated arrays, without extra copying.

For some allocators this assumption is incorrect. Most notably newlib-nano implements realloc() as malloc() + memcpy() + free().

The logic in pb_decode.c should be reworked so that it does something smart about this. If decoding from a buffer, one could look ahead to see how many items there are in array. If decoding from stream, some kind of preallocation logic should be applied.

PetteriAimonen commented 4 years ago

This might actually be quite important, considering how newlib-nano also has poor realloc() performance.

roblabla commented 6 days ago

This is important to me. I'm currently looking into using nanopb in the context of a windows driver. The WDK environment does not provide a way to realloc AFAICT, so the only solution is to implement our own realloc using malloc+memcpy. Doing this for every item of a repeated field is fairly slow.

In my case, the entire protobuf structure sits in memory, so a lookahead strategy would be quite effective - in theory we should be able to know the size of the structure in advance, and only do a single call to pb_realloc with the correct size.

I also wonder if using an amortized growth strategy might be worth it. Instead of reallocating on every field, we could store the repeated fields in a vec-like datastructure, which instead of allocating only one extra item in the array, has a separate length and capacity, and doubles capacity each time it needs to grow. This should give us log2(n) calls to realloc, instead of n.

PetteriAimonen commented 6 days ago

For a short-term solution, you could implement a custom arena-allocator (or use some available implementation). Almost all of the realloc growth in nanopb will occur to the latest allocation.

Though myself I would aim to use max_size and max_count to end up with static allocation bounds.

std::vector would make a lot of sense for C++. For C I'm a bit reluctant to invent a nanopb-specific vector type. Lookahead is probably the most reasonable approach, especially as almost all users are decoding from buffers.