serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
9.16k stars 775 forks source link

(De)serialization of enums could use a simpler format #251

Closed Yoric closed 8 years ago

Yoric commented 8 years ago

I have worked only with serde_json, to I may be missing something that applies to other (de)serializers.

Serialization

Serialization of enums seems to always produce objects, even for trivial constructors

#[derive(Serialize, Deserialize)]
pub enum Foo {
    A, B, C
}

fn main() {
  println!("{}", serde_json::to_string(&Foo::A).unwrap());
  // prints: `{"A":[]}`
}

This produces {"A":[]}. I would prefer if it produced just the string "A".

Deserialization

More importantly, deserialization of enums seems to always require objects, which is annoying when one attempts to implement a protocol.

#[derive(Serialize, Deserialize)]
pub enum Foo {
    A, B, C
}

fn main() {
  let a : Foo = self::serde_json::from_str("\"A\"").unwrap();
  // panics with a syntax error
}

I would have preferred if it managed to deserialize the string "A" into enum constructor A.

edit Fixed typo in the second example.

erickt commented 8 years ago

cc-ing https://github.com/serde-rs/json/pull/37.

Yeah, it's unfortunately a bit painful. Serde used to have this serialization, where for a given enum like:

enum Foo {
    A,
    B(usize, usize),
}

It used to serialize this into "A" and {"B", [0]}. We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.

Given that users might want things to be serialized in different manners, I've come to the feeling that it's the responsibility of serializers to have "some" consistent format, but it might not be the correct choice in every case. Other users might want enums to be serialized as an integer instead of a string. This is what cloudflare/gose does in their benchmark.

Instead, I've been thinking that we this might be better solved if we add in some attributes that help to customize the serialization format. What if we had something like:

#[derive(Serialize, Deserialize)]
#[serde(serialize_as_string)]
pub enum Foo {
    A, B, C
}

#[derive(Serialize, Deserialize)]
#[serde(serialize_as_usize)]
pub enum Bar {
    A, B, C
}

That at compile time would verify that all variants have no arguments, and have them generate code that serializes them as a &str or a usize? I'm open to suggestions for better names.

jwilm commented 8 years ago

I've run into this same situation where enum values must be deserialized into objects. My workaround so far has been a two step deserialization. First, serde is used to deserialize the data into a StructRaw where the enum member is typed as String. The second step tries to convert StructRaw into Struct and replace the stringified enum members with the actual enum variant. Naturally, this isn't ideal, and what's worse, it's surprising. Enum variants without data are being serialized and deserialized as if they did.

The old serialization @erickt mentions that was removed sounds like the correct default - type driven without surprising behavior. An application developer describes what their data looks like, and the de/serializer can figure out what to do. The format sometimes dictates whether they are encoded as usize or strings. Other times, as with JSON, the format is indifferent, and at that point it would be nice to hint to the serializer what to do as in @erickt's examples (#[serde(serialize_as_string)]).

If serde did revert to the previous strategy, servo would need a way to keep its current behavior. It doesn't feel unreasonable (if you agree with my argument so far) to require an annotation to make that happen; for example, #[serde(enum_constant_shape)], or something to that effect could trigger the servo behavior.

Yoric commented 8 years ago

I tend to agree with @jwilm that the least surprising is to serialize as string.

Also, my biggest issue is deserialization, which doesn't accept a raw string, despite it being much more logical.

jwilm commented 8 years ago

Least surprising seems subjective based on target format. I'm just saying that transforming the layout of non tuple variants into objects is surprising regardless.

jwilm commented 8 years ago

Here are some examples to clarify my stance on the various issues.

enum Foo {
    A, B, C
}

Serde itself should not be opinionated about this. The target format should have a sensible default, (eg usize or &str). If it's a format like JSON that could easily encode them as either, serde should support specification via annotation as @erickt described.

enum Bar {
    A,
    B(usize)
}

Expected behavior for this case is serializing Bar::A as "A", and Bar::B(5) as "{B: [5]}" assuming JSON. Generally, the latter should serialize as a tuple variant, and the former as as a &str or usize. If serde doesn't want to revert to this standard behavior, an opt-in attribute would be nice in addition to the blanket usize/&str attributes. Though, I'm still of the opinion that this should be the default, and the current behavior be opt-in.

fuchsnj commented 8 years ago

Just want to make a small clarification to Yoric's deserialization example

let a : Foo = self::serde_json::from_str("A").unwrap();

This should be expected to fail considering A is not valid JSON. Perhaps you meant this? (Extra quotation marks)

let a : Foo = self::serde_json::from_str("\"A\"").unwrap();
fuchsnj commented 8 years ago

We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.

This reasoning is absurd! The performance problems of a completely different language for a single Rust project should not be complicating every other project written in Rust. I understand if we want to support this use case, but certainly do not make it the default!

Yoric commented 8 years ago

@fuchsnj Thanks, I'll fix the example.

Yoric commented 8 years ago

@erickt If you point me at the code, I can fix at least the deserialization part, to try and make sure that we also accept "A" and not just { "A": [] }. I believe that being more relaxed in deserialization won't hurt. Mostly, I need to know if it's a problem of serde or serde_json.

oli-obk commented 8 years ago

Just implement a branch for quotes here: https://github.com/serde-rs/json/blob/master/json/src/de.rs#L593

softprops commented 8 years ago

sigh I just ported a bunch of rustc-serialize based code over to serde 0.7 and hit this ( its a good thing I had unit tests! ). Is there a way to work around this to get the older behavior. The empty array behavior seems like a very special case. The old behavior seems like the common case.

softprops commented 8 years ago

Found my fix here. Dropping a for anyone else that needs the same solution.

Yoric commented 8 years ago

@SimonSapin , what's Servo's take on this issue?

SimonSapin commented 8 years ago

@Yoric I can’t speak for everyone on the Servo team on this. In particular:

It used to serialize this into "A" and {"B", [0]}. We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.

@erickt, do you remember who said that and in what context? I wasn’t even aware that we had any serde-serialized JSON parsed by JavaScript.

My personal opinion of this issue is that it doesn’t matter. In my use cases so far, either:

Yoric commented 8 years ago

I’ve found it easier to use the serde_json::value::Value or rustc_serialize::json::Json enum than try to shoehorn this format into the Serialize and Deserialize traits.

Well, having derived Serialize and Deserialize to experiment is extremely useful. Unfortunately, the second I need to step out of Serialize and Deserialize and into manually using serde_json::value::Value for just one of the 15 enums and 40 structs involved in this AST, I need to do it for all these enums and structs, which kind of kills the fun of it.

@erickt I'd go for either rolling back to the previous behavior or accepting a hint serialize_as_string. Note that we probably want it to work per-variant, too, e.g. in the following, only variantA can be meaningfully serialized as string:

enum Foo {
  #[serde(serialize_as_string)]
  A,
  B(usize),
  C{min: usize, max:usize}
}
jwilm commented 8 years ago

@erickt any updated thoughts since all of the feedback?

I've started implementing Serialize by hand for some enum types just to work around this behavior.

jimmycuadra commented 8 years ago

This issue is complicating things for me in a few different projects. I'd prefer the old default that everyone seems to be asking for, but it would also be a workable replacement if we could use #[serde(serialize_with=$path)] for enum variants in addition to struct fields.

jimmycuadra commented 8 years ago

Does anyone have a workaround for this until the Serde team addresses it? Is the only choice to implement the relevant traits manually?

jwilm commented 8 years ago

@jimmycuadra I've been manually implementing the conversion when needed. Here's an example: https://gist.github.com/jwilm/30b62746a8f9dcd910cd470e2836a7cc. You might want to implement the conversion to a &str with AsRef<str> instead of a custom function as_str as in the example.

jwilm commented 8 years ago

Since we're doing this in several places, I ended up refactoring the above into a macro relying on a FromStr implementation for the same type. Here's the macro for anyone interested.

TimNN commented 8 years ago

I've just spend some time trying to figure out why my enum was not deserializing (I'm consuming and external json api).

I find the current behaviour especially surprising when trying to deserialize a C-like enum from a json string value.

jwilm commented 8 years ago

Hoping to move this forward with some reflection on what's been discussed so far.

Users who have spoken up in this thread suggest that the intuitive behavior of serde in the case of enums with unit-like variants is to serialize them as a String/usize and not as a map. @erickt mentioned that Servo actually depends on the new behavior which treats them as a map and suggested introducing annotations to bring back the old behavior. However, @SimonSapin suggested this wasn't true of Servo's serde use.

The annotations would still be useful for hinting at the Serializer/Deserializer implementations for choosing usize vs string, but to require annotations by default has been shown to be counter-intuitive based on the activity in this thread.

It would be great to get some consensus on this decision from the serde team so that someone could make an appropriate PR to implement it. I noticed that some issues are starting to get marked with breakage, and it would be nice to get this resolved for the next breaking release (or earlier depending on the decision here).

As to not leave this too open-ended or hand wavey, here is a concrete proposal.

  1. Revert to the original unit-like variant serialization/deserialization behavior. That is, expect a String or usize instead of a map.
  2. Support annotations on enums to choose usize/string behavior for unit-like variants. For example, #[serde(unit_variant_as_usize)] and #[serde(unit_variant_as_string)] could hint at serializer/deserializer implementations to choose one over the other. This probably only matters for human consumable serialization formats such as yaml and json since binary formats would almost certainly want to use the more compact usize serialization.
SimonSapin commented 8 years ago

Servo isn’t a single person. I wrote above about my experience, but I can’t speak for everyone else on the Servo team.

jwilm commented 8 years ago

d8fb2abd032ad1907d269c24d518a9695493a34b is the commit where enum serialization was initially changed to {"variant": ["fields, ...]}. Unfortunately the message doesn't elaborate on the Servo question.

I also poked around in the Servo script component a bit, and the only references to serde::{Serialize, Deserialize} traits were for empty JSTrace impls on ipc channel types.

The aforementioned commit is dated Date: Mon Jun 9 07:51:53 2014 -0700; the next step is looking for serde usage around that time, and we can maybe find out who requested this and whether its still relied upon.

jwilm commented 8 years ago

The earliest commit I can find in Servo referencing serde are from @pcwalton servo/servo@6eacb0c99586865913c9e92a46a5b5945655bd7d almost 1 year after this change was made in serde.

Aside from serializing display lists, it looks like Servo's serde usage is almost exclusively related to IPC, and it might not care about the actual serialization layout as long as it's symmetric for IPC sender/receiver pairs. The IPC components don't use JSON anymore either, it's all bincode.

None of Servo's usage I can find appears reliant on unit-like enum variant serialization change being discussed here, but that's just what I can see as an outside observer.

There's not a lot of evidence to suggest that this is a real need (or maybe I'm just missing it):

We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes

Lindenk commented 8 years ago

So, at this point, have we reached a consensus to restore the old behavior, and if not, who else should be in agreement before someone can submit a PR?

alexbool commented 8 years ago

+1 for restoring the old behavior

dtolnay commented 8 years ago

+1 from me. I don't see any compelling argument for {"A": []}.

The behavior from serde_codegen is correct though - it calls serialize_unit_variant. We need to fix serde_json (and serde_yaml and possibly others) to use the simpler format for unit variants. Let's resurrect https://github.com/serde-rs/json/pull/37 and just make sure it is able to deserialize both formats.

Lindenk commented 8 years ago

Well, would this PR solve the issue for serde_yaml then: dtolnay/serde-yaml#16?

dtolnay commented 8 years ago

Yes - although above all, I would like to keep serde_yaml consistent with serde_json because JSON is much more mainstream, so I would wait on serde_yaml until there is a solution in serde_json.

dtolnay commented 8 years ago

This has been fixed in serde_json 0.8.0 and serde_yaml 0.4.0.

jwilm commented 8 years ago

Thanks @dtolnay!