protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.2k stars 15.44k forks source link

Delimited Messages - let's harmonize across languages #10229

Open otri opened 2 years ago

otri commented 2 years ago

What language does this apply to? If it's a proto syntax change, is it for proto2 or proto3?

No syntax change.

If it's about generated code change, what programming language?

All Languages

Describe the problem you are trying to solve.

Delimited messages is so core to serializing repetitive payloads to file and network streams that it seems this should be classed as a core use case.

Describe the solution you'd like

Coming into the parsing of delimited messages fresh and pulling my hair out, I missed the details that function for C++ was contained in delimited_message_util.h. However, the solution presented by Kenton in https://github.com/protocolbuffers/protobuf/pull/710 is IMHO much better and more obvious.

Please harmonize this small but critical delimited function with codegen, and let's get this key function mainlined across; Python, Java, C++, and C#. I'm using this length delimited reading/writing on three of the four languages here, and it's telling of how valuable the cross-platform nature of protocol buffers is with delimited messages.

Describe alternatives you've considered

It's varied, but stuffing the delimited reading/writing into C++ utils is confusing. It's missing from Python so always rolling own, but length delimited function is present in Java and C#. They all share uint32 length with specific byte ordering style.

Additional context Add any other context or screenshots about the feature request here.

Not at this time.

fowles commented 2 years ago

Honestly, this is a totally reasonable request. We don't have the cycles to pursue it right now, but if you were interested in implementing it we would be happy to accept PRs.

acozzette commented 2 years ago

We have an internal implementation of this for Python that we might want to just open source.

neild commented 2 years ago

FYI, existing proposal for adding this to the Go implementation: https://github.com/golang/protobuf/issues/1382

jskeet commented 2 years ago

C# already has ParseDelimitedFrom(Stream) - do we anticipate any need for other changes?

rmelick-muon commented 11 months ago

@acozzette What is the process like for open sourcing your internal implementation? I'm happy to try and contribute something for python as brand new standalone patch, but if there is something existing that could be released that might be more expedient.

anandolee commented 10 months ago

I've put the following APIs

def serialize_length_prefixed(message, output) -> None
def parse_lengh_prefixed(message, input_bytes) -> message

into one of our projects' design doc. Will add the support once the design has been approved

rmelick-muon commented 10 months ago

@anandolee I've done some implementation in our internal code of very similar APIs, and what I quickly discovered for parsing, was that I also had an api that could handle parsing multiple messages from a stream of bytes (for example an open file).

These would look something like

def parse_all_delimited_from(buffer: bytes, message_class: Type[M]) -> Iterator[M]:

def parse_all_delimited_from_reader(
    reader: BufferedReader, message_class: Type[M]
) -> Iterator[M]:

Or, a parse method that tells you how many bytes it consumed to parse the message, so you can advance your position in a large buffer and then parse another method

def parse_delimited_from(
    buffer: bytes, starting_position: int, message: Message
) -> int:
thomasvl commented 9 months ago

fyi - ObjC (and Swift) both have apis for delimited messages.

anandolee commented 3 months ago

length_prefixed for python is now supported with https://github.com/protocolbuffers/protobuf/commit/3a9f0743ea8d82f489a65f7d087fa01d26ac5f56

jesseclark commented 1 month ago

Not having parse_delimited_from in the Ruby library is quite painful. Any hope of getting this implemented any time soon?