Opensource C++ zero-copy API

xfxyjwf commented 8 years ago

Protobuf has zero-copy support to avoid copying string/bytes fields when parsing protobuf messages and it's used pretty much everywhere inside Google, but the feature has never made its way into the opensource repo. Now protobuf 3.0.0 is released and we will probably have more time to look into incremental improvements. The zero-copy API is a good candidate to be included in the next 3.x release.

Opensourcing the zero-copy API will involve:

opensource related string/buffer classes (Cord and its dependencies).
un-exclude zero-copy APIs from the message interface (such as ParseFromStringWithAliasing).
un-exclude the support for ctype = STRING_PIECE and ctype = CORD.

(1) is probably the most difficult part as that's a large chunk of code and it may not be portable.

jjyao commented 7 years ago

Any updates on this feature?

xfxyjwf commented 7 years ago

@jjyao This unfortunately hasn't made into our agenda yet. If this feature is useful to you, can you post here your use case and estimate how much it can help? More concrete use case example can help us prioritize it.

bobobo1618 commented 7 years ago

I'm also quite interested in this feature.

More concrete use case example can help us prioritize it.

@xfxyjwf I'm writing an application-specific database server with gRPC and RocksDB. I want to:

Accept serialized protos from clients through gRPC and store them in RocksDB verbatim, without parsing and constructing a full object in memory.
Retrieve serialized protos from RocksDB and send them back to clients without parsing and reserializing them, ideally as part of another proto, which would serialize only what's necessary.

I want this because parsing and serialization currently take ~30% of my total response time and I don't really need them.

Here's a flame graph profile that shows what I'm seeing.

stellanhaglund commented 7 years ago

Is this the thing that cap'n proto does that makes it fast than protobuf?

xfxyjwf commented 7 years ago

@stellanhaglund No, it's not the main cause of the performance difference. cap'n proto is very similar to FlatBuffer and what I described in https://github.com/google/protobuf/issues/3296 can be said to cap'n proto as well.

johnfb commented 7 years ago

I am very interested in this feature. I have been suggesting at my work that we adopt something like protobuf for a long time. One of the major push backs has been the ability to zero copy large binary/string values. This is because we have many applications where an extra copy or two of the data means the processors/memory bus is now saturated.

Our usual process stream for data look a lot like:

DMA from network interface to shared memory
pass off the shared memory reference to the process(es) to do calculations
calculations done from shared memory to shared memory
pass shared memory reference to further process(es)
DMA out of the machine

Control message and meta data are small enough that copying is no problem (and in fact encoding as json etc. is usually good enough). Typical data is large matrices (think 16+MB) of complex integer (often 16 bit), complex IEEE binary16 (half) or complex IEEE binary32 (float). While meta data may be 64 bytes in total encoded as a struct. Note we often also have the requirement that the data be machine vector aligned (typically 32 byte align). A "slow" data rate is 3-5 Gigabit/s.

It'd be great if we could encode such data as something like protobuf and not have to manually maintain readers and writers and representations in multiple languages. We are already making an effort to use protobuf for control data, which IMO it already excels at.

arthur-tacca commented 7 years ago

Perhaps Cord will / could be open sourced as part of the Abseil library. The initial release doesn't include it, although there is a passing mention in malloc_extension.h.

xfxyjwf commented 7 years ago

@arthur-tacca Yep. The Cord type will be included as part of Abseil. And after we migrate to use Abseil, supporting zero-copy ctypes should be straightforward.

chris-hite commented 6 years ago

Hello, I ran into some performance problems at my previous HFT job an thought it be nice to have a zero-copy, heap free, protobuf parser.

If I were going to hand write code that parsed a specific protobuf schema, I'd typically do all my processing on the stack and consume all data in one pass.

I could see writing a C++ functional template heavy low level decoder giving me the same performance. I would best describe it as X(name proposals welcome) is to SAX as regular protobuf bindings are to DOM.

instead of heaping a std::string, you get a std::string_view.
if you don't care about a field you should be able to say so at compile time
if you don't want to parse a subobject you should be able to skip it or even break out of parsing

On the generation side I could see doing something something similar.

it should be possible for a message routing app to pass payloads without decoding them

Is there interest for this kind of thing? My fear is that C++ guys that really care about performance would avoid protobuf anyway. I guess my target audience are skilled C++ devs worried about performance forced to speak protobuf for historical reasons or a contract with outside components.

Does anyone have a spiffy name?

Has anyone seen something like this? I found lots of alternative wire formats with language bindings: SBE, CapNProto, etc.

toddlipcon commented 6 years ago

I don't think we would switch to using a SAX-like parser except maybe in some very specific circumstances in our project. For us, the overhead of most PBs is negligible (and I expect to be even lower when we switch to using arenas). The main exception is the std::string allocation of lots of tiny strings -- we're stuck on the pre-C++11 ABI, so every string ends up being a heap allocation/free pair.

FSMaxB commented 6 years ago

I can't use this library without that feature at all!

I use an arena, because I store sensitive key material in my protobuf messages and I provided an allocator with safe memory to the arena (sodium_malloc, not swapped out, zeroed out on free, guard pages etc.).

Given that the key material is stored in bytes fields, protobuf allocates them on the heap in std::string and completely bypasses the safe memory that I want the keys to reside in.

I already halfway ported my code from protobuf-c to protobuf, only now finding out that all my key material completely bypasses the arena. So now it seems like I have to throw that away and stick with protobuf-c (which makes me really unhappy).

tianyapiaozi commented 6 years ago

Any updates on this feature?

gerben-s commented 6 years ago

I think string_view should be a solid contender to be fully released soon.

Cord's are a thoroughly more heavy weight type. Integrating ZCIS with Cord's ties our most basic library directly into ABSL. We thread a little more carefully here.

MyUmmaGumma commented 6 years ago

@gerben-s Could you please elaborate what ZCIS/Cord and StringView are with respect to zero-copy?

gerben-s commented 6 years ago

Zero copy parsing of strings can be achieved by aliasing string_view's or Cord's with the underlying buffer. Cord is a heavy weight type from the absl lib, which needs to be directly supported by our ZeroCopyInputStream (ZCIS) abstraction.

gerben-s commented 6 years ago

@FSMaxB On the level of safety I understand your wishes, but its hard for us to make any such guarantee about not storing memory on the heap.

If you have such stringent security demands, I think C++ protobuf is not the right fit.

We are thinking about how to expose aliasing but we want to be careful and expose the right API.

FSMaxB commented 6 years ago

We are thinking about how to expose aliasing but we want to be careful and expose the right API.

That makes sense, especially without std::string_view/std::span

FSMaxB commented 6 years ago

The only way to go for Google would probably be abseil, but that doesn't go well with semantic versioning.

ThomasColthurst commented 5 years ago

My project, https://github.com/google/nucleus, would very much like this feature.

Nucleus is a package for reading and writing genomics data. It relies on another package called htslib to parse some of the more complicated formats like VCF. Unfortunately, htslib insists on putting the parsed data into memory structures it allocates itself, which leaves us the task of copying that data into protocol buffers.

On benchmarks reading a 100M gz-compressed VCF file, this extra copying causes our C++ reader to be almost twice as slow (20 seconds vs. 11 seconds) as another open-source package for reading VCF files (that doesn't rely on protocol buffers and can thus use the htslib allocated memory directly).

prem-nuro commented 5 years ago

https://groups.google.com/forum/#!topic/abseil-io/JzrwSIE_ZSo

With Cord coming soon hopefully this can be unblocked.

msn-tldr commented 5 years ago

Any update on addition of std::string_view/zero-copy support? That would be really useful. I have a client sending data-buffers to a server via gRPC, and right now i can send data only as string/byte fields in protobufs. The client keeps the data-buffers around, until the rpc is successfully completed implying server has received the data-buffer. It will be great to have 'string_view' support in protobufs, so that client doesn't have to make 'string-copies' of these buffers. The buffers are atleast few MBs, data-transfer throughput has to be reduced to take into account memory-overhead of this copy.

arthur-tacca commented 5 years ago

@msn-tldr If you use the gRPC async API on the client side then you don't need to keep the request object in memory while the call is in process.

msn-tldr commented 5 years ago

@arthur-tacca I am referring this async-client example code, this. Do you mean to say at after line 72(PrepareAsyncSayHello call below), the request can be de-allocated( say if it was on heap)?

std::uniqueptr<ClientAsyncResponseReader > rpc( stub->PrepareAsyncSayHello(&context, request, &cq));

Even if you mean this then also, wouldn't grpc keep a copy of this request internally( including the data-buffer )? Then i still have 2 copies of the buffer, one with gRPC and one with the client-app, so it can retry the buffer, if write fails.

arthur-tacca commented 5 years ago

@msn-tldr Regarding freeing the request object: That is correct. Indeed I asked for an almost identical issue to be clarified in the docs in grpc/grpc.github.io#774

Regarding copying to a buffer: That is correct, and is the nature of protocol buffers. When you want to serialise them, they are serialised to a buffer that includes a complete copy of any bytes objects contained within them.

Another important point: If you have a std::string object and you want to set a member of a protocol buffer object to it, you can use move semantics to avoid a copy i.e. my_protobuf.set_myfield(std::move(my_string_obj)). Of course if you want to set the field to a substring of an existing buffer then this won't help.

(More detail than you probably need/want: technically it is possible to serialise a protocol buffer object to a stream, which means its members are not necessarily copied into a single buffer all at once, but eventually every individual byte will still be copied. Besides, I imagine the synchronous gRPC API probably serialises to a single buffer at once, and the async gRPC API almost certainly does. If allocating such big buffers is a problem, the usual recommendation with gRPC is to break the bytes objects into chunks and using a streaming request to send them. You could use FlatBuffers with gRPC which would get you zero-copy when reading (the main subject of this issue), but there would still be some copying e.g. when gRPC makes the system calls that send/receive data. TensorFlow uses gRPC but does some other special tricks to avoid copying tensor data around too much, but I don't believe they are available for use outside that library.)

msn-tldr commented 5 years ago

@arthur-tacca thanks this, it helped.

pcmoritz commented 5 years ago

Is there any update on this feature? This would be very useful. Now that absl has a string_view implementation (https://github.com/abseil/abseil-cpp/blob/master/absl/strings/string_view.h) it seems like that could be used :)

prem-nuro commented 4 years ago

absl::Cord has just been released: https://github.com/abseil/abseil-cpp/commit/3c814105108680997d0821077694f663693b5382

arthur-tacca commented 4 years ago

I think there are two requests here:

(a) Allow ctype = STRING_PIECE (b) Allow ctype = CORD

The original comment says "(1) opensource ... Cord and its dependencies ... is probably the most difficult part" But surely that's only needed for ctype = CORD? For ctype = STRING_PIECE, a vendored copy of StringPiece has been included in open-source protobuf for years. That leaves items (2) and (3) in the original comment (un-excluding the relevant code from the open-source release). This might be a lot less work than releasing the full feature including CORD, assuming (2) and (3) can reasonably be done for STRING_PIECE without also doing them for CORD.

The ctype = STRING_PIECE feature solves the zero-copy problem in the case the string you want refer to without copying is contiguous, which is probably enough functionality for many people (e.g. me 😃). So perhaps, rather than waiting for some solution involving cord, just the string piece functionality could be open sourced?

I thought this was already well understood, but reading through the comments it seems it hasn't been mentioned here before. The comments mostly discuss alternative types such as std::string_view and absl::Cord, but there's been no mention of protobuf::StringPiece.

acozzette commented 4 years ago

std::string_view has made our StringPiece type obsolete, so I don't think we want to expose StringPiece publicly in any more places if we can avoid it. Eventually we will likely want to replace it with std::string_view. The main problem is that to get access to std::string_view, we need to require C++17 (currently we only require C++11). The other possibility is to depend on ABSL and use absl::string_view, but that would be a non-trivial change as well.

arthur-tacca commented 4 years ago

That makes sense, thanks.

pitrou commented 4 years ago

There are also standalone string_view backports. In Arrow we use https://github.com/martinmoene/string-view-lite successfully.

fm123456 commented 4 years ago

I can't use this library without that feature at all!

I use an arena, because I store sensitive key material in my protobuf messages and I provided an allocator with safe memory to the arena (sodium_malloc, not swapped out, zeroed out on free, guard pages etc.).

Given that the key material is stored in bytes fields, protobuf allocates them on the heap in std::string and completely bypasses the safe memory that I want the keys to reside in.

I already halfway ported my code from protobuf-c to protobuf, only know finding out that all my key material completely bypasses the arena. So now it seems like I have to throw that away and stick with protobuf-c (which makes me really unhappy).

I also encountered the same problem, have you solved it？

FSMaxB commented 4 years ago

I also encountered the same problem, have you solved it?

Yes, by first porting my code back to protobuf-c. Later abandoning the entire project and then never using protobuf ever again in the future.

toddlipcon commented 4 years ago

Just in case anyone finds it useful, I did a little hacking on a branch that supports storing string buffers in arenas: https://github.com/toddlipcon/protobuf/commit/00cc3104a648c46bfa41a44391a9f7571a0694df

The above only supports it on the serialization side -- i.e. if you call proto_on_arena.set_foo(const std::string& bar) it will copy bar's contents on the Arena and make a std::string-compatible-memory-layout object to point to it. Note that it's also specific to libstdcxx c++11 string ABI and won't work with libc++ or other ABIs (though presumably could be modified to support those as well).

jeaye commented 3 years ago

This is a huge deal, especially on mobile. Now that absl::string_view exists, what is next for getting a zero-copy API?

chys87 commented 3 years ago

I'm very interested in this feature. I have a project where we embed long strings (several KiBs) in protobuf messages. It would significantly save CPU time if this feature is available.

danieljennings commented 2 years ago

Chiming in to say that we use Protobuf in virtually all of our projects here and would love to see this fixed, even if it required upgrading to C++17 (we're only on C++14 for the most part now.)

mayur-who commented 2 years ago

I would even love to see this implemented. We can use protobuf for our data plane APIs as well then

troberti commented 2 years ago

We would also really like to see std::string_view support as well. Would make arenas actually useful.

fowles commented 2 years ago

We have a lot of long term plans that will drive us towards this space; however, the migrations required make it slow going. Expect to see us start breaking ground in over the next year.

GOGOYAO commented 1 year ago

Looking forward this feature

fowles commented 1 year ago

Support for absl::Cord landed in the spring, the next major step will come with editions which has started land on main. Once we have a release that fully supports editions (like October or January), we plan to expose a mechanism for using absl::string_view as the API for strings. After that we can revisit this to see what is missing from fully realizing this request.

HamzaHajeir commented 1 year ago

This would be a very vital feature, really.

I just wrote a question in StackOverflow:
"I'm studying for adopting Protobuf in my Embedded IoT framework, wherein a message can be received from network sources as MQTT/HTTP/etc and being fed to the system.

I seek to fully process the incoming data without copying it, so the intended use is to feed protobuf with the starting address std::uint8_t* and size.

The intended output of array data (strings and raw data) would be std::string_view and std::span respectively*, which would point to the received data."

For embedded systems copying yields more heap fragmentation, and with large messages, this becomes worse.

I really can't see a reason why it's not already built other being not supporting c++17 forward (though can be an optional compiling option).

neuliyiping commented 12 months ago

Just in case anyone finds it useful, I did a little hacking on a branch that supports storing string buffers in arenas: toddlipcon@00cc310

The above only supports it on the serialization side -- i.e. if you call proto_on_arena.set_foo(const std::string& bar) it will copy bar's contents on the Arena and make a std::string-compatible-memory-layout object to point to it. Note that it's also specific to libstdcxx c++11 string ABI and won't work with libc++ or other ABIs (though presumably could be modified to support those as well).

I try this, but it is not work. string buffers also on heap

github-actions[bot] commented 5 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

AlexeySalmin commented 5 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This bugreport is going to school by now.

follesoe commented 2 months ago

Why tease is with the possibility for zero-copy API, and then let it hang and dingle like this 😅

protocolbuffers / protobuf

Opensource C++ zero-copy API #1896