msgpack / msgpack-c

MessagePack implementation for C and C++ / msgpack.org[C/C++]
Other
3.03k stars 883 forks source link

Support alternative serialization formats #414

Closed oberstet closed 8 years ago

oberstet commented 8 years ago

autobahn-cpp is a WAMP client library, and uses msgpack-c to support MessagePack serialization variants of WAMP.

WAMP in addition supports JSON and CBOR serialization formats. E.g. Crossbar.io, a WAMP router, supports all 3 serialization format, and is able to transparently translate between different serialization formats.

In autobahn-cpp, we had some discussions about how to support JSON and CBOR.

One aspect that came up is that msgpack-c actually provides 2 things in 1:

  1. a MessagePack serializer/unserializer
  2. a Variant type value library

Eg autobahn-cpp was perviously using a Boost Variant type in it's public API, but now uses msgpack-c for that.

I was wondering if we could retain 2. as a public API type in autobahn-cpp, while expanding 1. for supporting not only MessagePack, but also JSON and CBOR - transparently.

Users of autobahn-cpp would continue programming using msgpack-c types, and serialization format negotiation could happen dynamically and completely "under the hood" - without any user code changes.

Is this something that has been considered for msgpack-c?

jpetso commented 8 years ago

I think a more practical variant of this feature request could be for msgpack-c to support a SAX-style deserialization API.

In https://github.com/tplgy/bonefish/blob/master/src/bonefish/serialization/json_serializer.cpp and https://github.com/tplgy/json-msgpack/blob/master/include/json_msgpack/json_msgpack_sax.hpp, I use RapidJSON's SAX API to read JSON and write directly into a msgpack::object, without the need for an intermediate JSON document object. It would be nice if msgpack also offered an API that lets users parse into their own variants (e.g. JSON document) without having a msgpack::object created first.

For writing msgpack, the Packer API already is a SAX API that's entirely independent from msgpack::object itself. I don't believe any changes there would be necessary or worthwhile, it works well and efficiently.

@oberstet, if you want to keep msgpack::object as the variant type then there's nothing that msgpack-c needs to add: grab our code (ask for Boost relicensing if necessary) and you've got a msgpack::object <-> JSON conversion without unnecessary inefficiencies. The same could be done with msgpack::object <-> CBOR.

oberstet commented 8 years ago

@jpetso yes, I'd like to keep msgpack::object as the variant type in the user facing API of Autobahn - this was a longer discussion, I initially opposed that, but Dave (and you?) convinced my;) So let's stick to that. The only piece missing is ser/unser JSON/CBOR into msgpack::object.

Ah, I see .. so instead of extending msgpack-c, you are converting at WAMP message level https://github.com/tplgy/bonefish/blob/master/src/bonefish/serialization/json_serializer.cpp

jpetso commented 8 years ago

There's really nothing WAMP-specific about it, except how to parse the JSON string for binary data, but that's pluggable. It spits out a msgpack::object, which that code happens to use as WAMP message.

davidchappelle commented 8 years ago

My transport abstraction work includes some refactoring that will make adding support for JSON and CBOR quite easy.

On Thu, Jan 28, 2016 at 7:44 PM, Jakob Petsovits notifications@github.com wrote:

There's really nothing WAMP-specific about it, except how to parse the JSON string for binary data, but that's pluggable. It spits out a msgpack::object, which that code happens to use as WAMP message.

— Reply to this email directly or view it on GitHub https://github.com/msgpack/msgpack-c/issues/414#issuecomment-176498639.

redboltz commented 8 years ago

msgpack-c doesn't provide other types converter except JSON output. However, it provides an adaptation point named msgpack::type::variant. Let me explain in detail about that.

msgpack-c provides variant type named msgpack::object. It can convert to JSON string using output stream operator <<. msgpack::object is difficult to modify. So I added another variant type named msgpack::type::variant based on boost::variant. See the following example: http://melpon.org/wandbox/permlink/NWAlWzPNWHt6msii

Note: In order to use msgpack::type::variant, you need to define MSGPACK_USE_BOOST macro. When you use wandbox, online compiler that supports msgpack-c, add -DMSGPACK_USE_BOOST to the extra options text box and check MessagePack check box.

You can build msgpack::type::variant structure step by step. I think that insertion functions would be called in JSON and CBOR parser to convert to msgpack::type::variant.

You can use msgpack::type::variant with boost::static_visitor. So you can implement any converters using that. See the following example: https://github.com/msgpack/msgpack-c/tree/master/example/boost

msgpack::type::variant is not documented yet in wiki

Here is a little documentation: https://github.com/msgpack/msgpack-c/pull/349

oberstet commented 8 years ago

Now I am confused;)

autobahn-cpp _previously used boost::any as the dynamic type in the public, user-facing API and for shuffling around data parsed from the wire internally:

   /// A map holding any values and string keys.
   typedef std::map<std::string, boost::any> anymap;

   /// A vector holding any values.
   typedef std::vector<boost::any> anyvec;

   /// A pair of ::anyvec and ::anymap.
   typedef std::pair<anyvec, anymap> anyvecmap;

After a long discussion, we switch to msgpack::object instead of boost::any.

Are you suggesting to move to msgpack::type::variant, which seems to be a typedef for boost::variant?

It's funny: in a way, this the kind of situation I wanted to avoid originally by choosing boost::any: have a neutral ground w.r.t. serialization formats;)

redboltz commented 8 years ago

Are you suggesting to move to msgpack::type::variant, which seems to be a typedef for boost::variant?

I just wanted to inform you the recent version of the msgpack-c situation.

msgpack::type::variant is comparatively new introduced feature since version 1.2.0. It requires the boost libraries. Users can choose msgpack::object, msgpack::type::variant, or both.

Let's sum up the information to make decision.

I don't know much about autobahn-cpp's API history. I think that if there are already implemented converters, I recommend that check them out. And that have been introduced by @jpetso and @davidchappelle.

oberstet commented 8 years ago

msgpack::object is located on msgpack::zone

That was a main reason for switching from boost::any to msgpack::object for both internal and API in autobahn: memory efficiency and less dynamic allocations.

Both can easy to convert to different types.

"convert" sounds like double effort / memory.

I guess my question is: how would I parse CBOR from the wire, while creating a native msgpack::object on the fly and vice-versa, without creating some CBOR native object from the wire, and subsequently convert that to msgpack::object?

davidchappelle commented 8 years ago

boost::any was not well suited for custom data types. The previous api required that all information passed in was already converted to boost::any. Then internally there was a switch statement to figure out what the actual boost::any type was so that it could be serialized appropriately and placed on the wire. This was not an extensible solution. What if I was passing in a map of custom objects? As soon as we started using autobahn-cpp beyond the trivial use cases using intrinsic types it was immediately obvious that boost::any was a poor choice of abstraction. It also has performance overhead issues since all boost::any objects allocate storage on the heap even for intrinsic types. That equates to a lot of unnecessary malloc and copying overhead.

Converting to other types in our case is no different than what was being done before with respect to converting between wire format to boost:any to user-defined-types.

What we have now is really no different than what we had before except that msgpack::object has taken the place of boost::any.

Before

user-defined-type <=> boost::any <=> wire-format

Now

user-defined-type <=> msgpack::object <=> wire-format

I think that regardless of what is chosen as the variant there will be an intermediate step required to convert to and from the variant type.

On Saturday, 30 January 2016, Tobias Oberstein notifications@github.com wrote:

msgpack::object is located on msgpack::zone

That was a main reason for switching from boost::any to msgpack::object for both internal and API in autobahn: memory efficiency and less dynamic allocations.

Both can easy to convert to different types.

"convert" sounds like double effort / memory.

I guess my question is: how would I parse CBOR from the wire, while creating a native msgpack::object on the fly and vice-versa, without creating some CBOR native object from the wire, and subsequently convert that to msgpack::object?

— Reply to this email directly or view it on GitHub https://github.com/msgpack/msgpack-c/issues/414#issuecomment-177171641.

oberstet commented 8 years ago

@davidchappelle Thanks for reminding me of the other arguments there were for switching from boost::any! I have filed https://github.com/crossbario/autobahn-cpp/issues/86

Sidenote (opinionated): Why use a "user-defined-type" in the app in the first place? Why not use msgpack::object as the native app type? But that's a different, somewhat unrelated question anyway.

In above, I was more concerned that we might end up with:

Effectively putting JSON/CBOR second class (because of the necessary additional translation step in ()), and putting the translationg () code into autobahn-cpp, whereas if it would reside in msgpack-c, all users of msgpack-c would profit.

And hence my question of natively supporting JSON and CBOR directly within msgpack-c so we can have:

msgpack-c could still be "rightly" named "msgpack", as the API type is still msgpack::object though it would then support not only MessagePack, but also JSON and CBOR.

Would that be even feasible?

@jpetso Is that what you hinted at with the

I think a more practical variant of this feature request could be for msgpack-c to support a SAX-style deserialization API.

because having that would allow an efficient and cleanly separated implementation of the (*) translation?

jpetso commented 8 years ago

There is no need for msgpack-c to support any other serialization formats as part of the project as long as it offers interfaces for efficient packing and unpacking. Support for alternative formats can then be added as part of an independent project such as json-msgpack.

  1. In the case of "(anything) => wire-format (MessagePack)", the existing pack interfaces are sufficient.
  2. In the case of "wire-format (MessagePack) => (anything)", there are unnecessary inefficiencies because the current interfaces require the user to convert "wire-format (MessagePack) => msgpack::object => (anything)".

I filed msgpack-c #418 which has a slightly different focus than this issue but would take care of item 2. I also believe we have sorted out any remaining related issues in crossbario/autobahn-cpp#86.

I think that means this issue can be closed, since there is nothing else to do that requires tracking by this issue specifically. Is that correct? If so, please close.

davidchappelle commented 8 years ago

One thing to also note here. Unmarshaling to a msgpack::object doesn't have to include a full copy of bytes out of the received buffer. Instead, an option exists while unpacking to simply point msgpack::internal pointers to relevant offsets into the user supplied buffer. So although it is an extra step, any unnecessary copying can be avoided. With that said, I just had a read through msgpack-c #418 and I think that it would be a great optimization to take advantage of. However, since it is an internal detail, it shouldn't hold up any forward progress on supporting different serialization types.

On Wed, Feb 10, 2016 at 7:04 PM, Jakob Petsovits notifications@github.com wrote:

There is no need for msgpack-c to support any other serialization formats as part of the project as long as it offers interfaces for efficient packing and unpacking. Support for alternative formats can then be added as part of an independent project such as json-msgpack https://github.com/tplgy/json-msgpack.

  1. In the case of "(anything) => wire-format (MessagePack)", the existing pack interfaces are sufficient.
  2. In the case of "wire-format (MessagePack) => (anything)", there are unnecessary inefficiencies because the current interfaces require the user to convert "wire-format (MessagePack) => msgpack::object => (anything)".

I filed msgpack-c #418 https://github.com/msgpack/msgpack-c/issues/418 which has a slightly different focus than this issue but would take care of item 2. I also believe we have sorted out any remaining related issues in crossbario/autobahn-cpp#86 https://github.com/crossbario/autobahn-cpp/issues/86.

I think that means this issue can be closed, since there is nothing else to do that requires tracking by this issue specifically. Is that correct? If so, please close.

— Reply to this email directly or view it on GitHub https://github.com/msgpack/msgpack-c/issues/414#issuecomment-182642561.

oberstet commented 8 years ago

Thanks all for explaining / discussion. I am closing this issue as I was convinced the "actual itch" is better solved differently.