ludocode / mpack

MPack - A C encoder/decoder for the MessagePack serialization format / msgpack.org[C]
MIT License
521 stars 82 forks source link

Stream parser that doesn't buffer the entire message #105

Open evpopov opened 1 year ago

evpopov commented 1 year ago

Hi, Parts of this may have been touched on in #100 but I wanted to start a clean discussion here.

I'm trying to redesign an RPC-like protocol to use MessagePack. The protocol was historically based on TLVs and runs on a bare-metal system that is quite constrained on memory. The idea is for my device to accept calls over a TCP stream. Each call would consist of a command, parameters, data, etc and the device would execute whatever is being requested. One of those requests could be for a file upload or firmware upgrade and naturally, such an RPC call would be hundreds of kb if not a megabyte or two and I don't have anywhere near enough RAM to buffer the entire message. Historically, I'd have TLVs for the command, and data which allows me to know how long each section is as well as skip over parts of the request if I don't have to parse them. I'd parse the command TLV, so naturally, I'd know that for example I needed to save the data TLV to a file as that data TLV was being parsed a few hundred bytes at a time. Using TLVs also helps when transferring data over TCP because TCP is stream based and in many cases I need to know the size of the transfer ahead of time. MessagePack gives me the same benefit here because a MessagePack object is of known size.

I'm trying to redesign this TLV-based approach and have the entire request encoded inside a MessagePack message because this makes the protocol more "standard" rather than being defined by random proprietary TLVs. To do this, I need to be able to "feed" the parser with random ammounts of data as it becomes available while at the same time handle whatever the parser has decoded so far. After digging through the very good manual and trying the different APIs I'm almost finding what I need but not quite....

NodeAPI: mpack_tree_init_stream() seems to be almost exactly what I need because I simply simulate the "feeding" functionality through the read_fn() function and use mpack_tree_try_parse() to handle objects as they get decoded. The read_fn() function can return zero if I don't have any new data and life is good..... Except that the NodeAPI expects to be able to buffer the whole message and in my case, the message may be a multi-megabyte file upload.

ReaderAPI with fill and skip functions: That API gives me the freedom to parse the objects as they come in which is ideal, but the fill function is not allowed to return zero. The problem here is that my task cannot block. It has other things to do. Periodically, it checks for new data from the socket and can "feed" that new data to the parser, but I simply cannot block the task. I though maybe I could call mpack_read_tag() only if I have accumulated some data, but there is no way to know how many bytes mpack_read_tag() will want to consume.

Am I missing something? Is there a way to parse a stream and handle the data a few bytes at a time without buffering the entire message?

Thanks in advance