msgpack / msgpack-c

MessagePack implementation for C and C++ / msgpack.org[C/C++]
Other
3.03k stars 883 forks source link

Add formatted {,un}packers #393

Open HalosGhost opened 8 years ago

HalosGhost commented 8 years ago

This is a significant addition to the specification, and may very well not be something many people are interested in. However, it is one of my favorite features of another serialization library I know and love; and I would adore seeing it added to msgpack-proper.

Instead of having to make multiple calls to the various msgpack_pack_<type>() functions, it would be incredible to have a function take a format specifier and a list of things to be serialized (à la printf()); for example, being able to call something like the following and having it correctly pack both values:

msgpack_packf(pk, "i8,b", 25, true);

Similarly, offering a formatted unpacker (à la scanf()) would offer similar benefits for parsing out msgpack serialized objects.

Is there any interest in this other than my own?

halfwit commented 8 years ago

I would also be interested

redboltz commented 8 years ago

Hi @HalosGhost, I'm interested in your idea. I considered how to implement that. Then some questions came up.

For packing,

  1. Does msgpack_packf(pk, "i8,b", 25, true); make two message pack objects? or an ARRAY that contains two message pack objects?
  2. How to treat aggregate types such as ARRAY and MAP.
  3. How to pass STR, BIN, and EXT. They would require a length.
  4. How to check a mismatch between format strings and types of following variables.

Unpacking is more difficult than packing. I think that unpacking variables corresponding to a format from msgpack formatted byte stream is not practical (too complecated). Converting the variables from a msgpack_object is easier, but still difficult. Because a msgpack_object is a composite structure. That would contains a map of msgpack_object. And an element of the map would be array of msgpack_object.... So it's hard to make a format string.

I would like to know any ideas to solve my questions :)

HalosGhost commented 8 years ago

@redboltz, I suppose that example would be equivalent to calling the two respective packing functions one after the other, as a result, it would be two message pack objects (assuming I understand correctly).

The library which provides this functionality (you may have guessed) is a json library, so the syntax for arrays and maps is well-defined. I do not want these functions to simply become a json-to-msgpack converter, but we could lift some of json's syntax to suit the needs:

[i8,bool] would be an array consisting of those two elements rather than those two elements separately, and {i8:bool} would be a single-member map with the i8 as the key and the bool as the value.

Strings could have a modifier like # signaling that the next value passed after the string itself is the string's length (similar things can be done for binary and extensions).

As for the type comparison, it could be done one of a few ways: msgpack could either bite the bullet and accept glibc/gcc as a dependency (which I do not recommend) and use typeof, or, perhaps this might be a perfect chance to use _Generic. Alternatively, you could try to pull off type-checking in a similar way to printf().

If you would like a more clear picture of what I am imagining, take a look at Jansson; it is the {,de}serialization library I work with in json that provides these features. It makes working with json so much simpler, I have trouble imagining working with it in any other way (while using C, that is).

Note: Jansson does actually provide a formatted unpacker.

redboltz commented 8 years ago

@HalosGhost, I've checked Jansson. It seems to be a great work :) I think that I understand your vision. If you implement that functions and tests, please send a pull request. I will review it.

HalosGhost commented 8 years ago

@redboltz, sounds good! I will see what I can do.