mneumann / rust-msgpack

[DEPRECATED] msgpack.org implementation for Rust language. Please use https://github.com/3Hren/msgpack-rust
121 stars 31 forks source link

Structs are encoded w/ map type but are not valid msgpack maps! #40

Open drbawb opened 9 years ago

drbawb commented 9 years ago

I've been doing interop with other msgpack libraries this weekend, namely the official msgpack-javascript.

The problem is that we currently call _emit_map_len() before encoding a Rust struct into the stream. This pops the msgpack type fixmap (0x8<len>) onto the stream.

However when encoding struct fields we only encode the value of the field, which means this is not a valid msgpack map entry.

I have a fix for this over on my fork (though it's written on top of PR #39): feature/fix-struct-interop

The fix involves emitting the field-name when encoding; and reading/checking the field-name when decoding.


Again this only affects non-Rust readers: as they are expecting a map-key where one is not present. In my experience this means the fixmap is decoded into a single key (containing all the bytes of the struct) and then the rest of the stream is corrupted.

Would such a fix be welcome? Though it changes the semantics of how we're encoding structs?

mneumann commented 9 years ago

This is pretty inefficient. I'd like to see how other msgpack libraries encode structs.

drbawb commented 9 years ago

I could drop the assertion that field-names match which would eliminate the comparison (since the Rust decoder already knows this information.)

Though obviously there'd still be the penalty encoding the field name to make a valid msgpack map.


What if we just dropped the _emit_map_len() and _read_map_len() from emit_struct() and read_struct() and encoded structs as an array of fields instead?

The msgpack array type will still have a length; so we can still perform the sanity check on the number of struct fields during deserialization.

mneumann commented 9 years ago

Array of fields would work too, and I am wondering why I didn't do that! Can you try to encode structs as an array? I think the reason why I was using _emit_map_len() were some legacy msgpack files I had to use. Ideally msgpack would support "struct" natively.

drbawb commented 9 years ago

Check out this branch.

It just is a drop in replacement of _emit_array_len() and _read_vec_len(); and it passes all the current tests.

It should perform identically to the map version; the major win here is that third-party decoders will read our structs as an array of fields, rather than a corrupted map.

Also I just wanted to add that serialize::json does actually encode structs as maps (e.g JSON objects) so using the map type is not without precedent. Though I agree that encoding the field-names is somewhat contrary to the goal of msgpack -- which is meant to be a compact serialization format.


My main concern is that encoding structs as arrays would be a "breaking change." In the sense that this new decoder will not be able to read Rust structs emitted by the previous version of the encoder. This would obviously be problematic for anyone that has rust-msgpack data sitting around.

I'm toying around with the idea of using Cargo feature flags so that you can optionally compile msgpack with either the old map-based behavior, or this new array-based behavior.

We would just annotate the read_struct/emit_struct functions with attributes such as #[cfg(feature="struct-array")] or #[cfg(feature="struct-map")] and the user would declare which one they want to use in their own Cargo.toml.

rail44 commented 9 years ago

Hi, I've created msgpack encoder/decoder for my studying. https://github.com/rail44/msgpack-rs/ I made Struct to be encoded to Map. It may contain useful code for rust-msgpack implementing, FYI.

jpastuszek commented 9 years ago

The idea behind msgpack IMHO is that it needs no schema files; if you don't include key values then you relay on schema (rust structure definition) and this breaks this design. Also it is important to keep compatibility with different languages and Ruby won't parse current structure output. JSON serialised stuctures apparently encodes key values. Also I don's see any other methods of creating msgpack message with map of mixed value types other than using structs. So please encode key values as well.

There are different formats that are focused on efficiency by use of schema files.

mneumann commented 9 years ago

@jpastuszek @drbawb: I think the problem with this is that we still can't read back structs from other languages, as they might (and very probably will) encode the key/value pairs in a different order than we are trying to read them. Actually, IIRC, the msgpack spec does not say anything about structs in general only about maps. That is, directly mapping a (Rust) struct to a hashmap is a difficult thing inter-language-wise, or at least doing so without buffering the input.

Can we get this to work:

Rust::msgpack_encode(struct) -> Ruby::msgpack_decode(str).msgpack_encode -> Rust::msgpack_decode(str) == struct

I can only think of an array of tuples [(fieldname1, value1), ...]. This would have the advantage that we could also optionally remove the fieldname and it'd still work.

jpastuszek commented 9 years ago

Recently I have been thinking that the problem is in deserialisation. What your are really telling the library when you try to deserialise a struct is set of expectations on content of the message:

Now if there are any extra fields in the map they should be ignored. If one of struct data types is Option and filed in the map is missing you get None value.

Serialisation then is rather straight forward.

I am aware that this cannot be implemented with current API though...