Closed Yoric closed 8 years ago
cc-ing https://github.com/serde-rs/json/pull/37.
Yeah, it's unfortunately a bit painful. Serde used to have this serialization, where for a given enum like:
enum Foo {
A,
B(usize, usize),
}
It used to serialize this into "A"
and {"B", [0]}
. We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.
Given that users might want things to be serialized in different manners, I've come to the feeling that it's the responsibility of serializers to have "some" consistent format, but it might not be the correct choice in every case. Other users might want enums to be serialized as an integer instead of a string. This is what cloudflare/gose does in their benchmark.
Instead, I've been thinking that we this might be better solved if we add in some attributes that help to customize the serialization format. What if we had something like:
#[derive(Serialize, Deserialize)]
#[serde(serialize_as_string)]
pub enum Foo {
A, B, C
}
#[derive(Serialize, Deserialize)]
#[serde(serialize_as_usize)]
pub enum Bar {
A, B, C
}
That at compile time would verify that all variants have no arguments, and have them generate code that serializes them as a &str
or a usize
? I'm open to suggestions for better names.
I've run into this same situation where enum values must be deserialized into objects. My workaround so far has been a two step deserialization. First, serde is used to deserialize the data into a StructRaw
where the enum member is typed as String. The second step tries to convert StructRaw
into Struct
and replace the stringified enum members with the actual enum variant. Naturally, this isn't ideal, and what's worse, it's surprising. Enum variants without data are being serialized and deserialized as if they did.
The old serialization @erickt mentions that was removed sounds like the correct default - type driven without surprising behavior. An application developer describes what their data looks like, and the de/serializer can figure out what to do. The format sometimes dictates whether they are encoded as usize or strings. Other times, as with JSON, the format is indifferent, and at that point it would be nice to hint to the serializer what to do as in @erickt's examples (#[serde(serialize_as_string)]
).
If serde did revert to the previous strategy, servo would need a way to keep its current behavior. It doesn't feel unreasonable (if you agree with my argument so far) to require an annotation to make that happen; for example, #[serde(enum_constant_shape)]
, or something to that effect could trigger the servo behavior.
I tend to agree with @jwilm that the least surprising is to serialize as string.
Also, my biggest issue is deserialization, which doesn't accept a raw string, despite it being much more logical.
Least surprising seems subjective based on target format. I'm just saying that transforming the layout of non tuple variants into objects is surprising regardless.
Here are some examples to clarify my stance on the various issues.
enum Foo {
A, B, C
}
Serde itself should not be opinionated about this. The target format should have a sensible default, (eg usize or &str). If it's a format like JSON that could easily encode them as either, serde should support specification via annotation as @erickt described.
enum Bar {
A,
B(usize)
}
Expected behavior for this case is serializing Bar::A as "A"
, and Bar::B(5) as "{B: [5]}"
assuming JSON. Generally, the latter should serialize as a tuple variant, and the former as as a &str or usize. If serde doesn't want to revert to this standard behavior, an opt-in attribute would be nice in addition to the blanket usize/&str attributes. Though, I'm still of the opinion that this should be the default, and the current behavior be opt-in.
Just want to make a small clarification to Yoric's deserialization example
let a : Foo = self::serde_json::from_str("A").unwrap();
This should be expected to fail considering A
is not valid JSON. Perhaps you meant this? (Extra quotation marks)
let a : Foo = self::serde_json::from_str("\"A\"").unwrap();
We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.
This reasoning is absurd! The performance problems of a completely different language for a single Rust project should not be complicating every other project written in Rust. I understand if we want to support this use case, but certainly do not make it the default!
@fuchsnj Thanks, I'll fix the example.
@erickt If you point me at the code, I can fix at least the deserialization part, to try and make sure that we also accept "A"
and not just { "A": [] }
. I believe that being more relaxed in deserialization won't hurt. Mostly, I need to know if it's a problem of serde or serde_json.
Just implement a branch for quotes here: https://github.com/serde-rs/json/blob/master/json/src/de.rs#L593
sigh I just ported a bunch of rustc-serialize based code over to serde 0.7 and hit this ( its a good thing I had unit tests! ). Is there a way to work around this to get the older behavior. The empty array behavior seems like a very special case. The old behavior seems like the common case.
Found my fix here. Dropping a for anyone else that needs the same solution.
@SimonSapin , what's Servo's take on this issue?
@Yoric I can’t speak for everyone on the Servo team on this. In particular:
It used to serialize this into "A" and {"B", [0]}. We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes.
@erickt, do you remember who said that and in what context? I wasn’t even aware that we had any serde-serialized JSON parsed by JavaScript.
My personal opinion of this issue is that it doesn’t matter. In my use cases so far, either:
Serialize
and Deserialize
traits on custom types, so the wire format is an implementation detail. Servo switched to the more efficient bincode instead of JSON for IPC.serde_json::value::Value
or rustc_serialize::json::Json
enum than try to shoehorn this format into the Serialize
and Deserialize
traits.I’ve found it easier to use the
serde_json::value::Value
orrustc_serialize::json::Json
enum than try to shoehorn this format into theSerialize
andDeserialize
traits.
Well, having derived Serialize
and Deserialize
to experiment is extremely useful. Unfortunately, the second I need to step out of Serialize
and Deserialize
and into manually using serde_json::value::Value
for just one of the 15 enums and 40 structs involved in this AST, I need to do it for all these enums and structs, which kind of kills the fun of it.
@erickt I'd go for either rolling back to the previous behavior or accepting a hint serialize_as_string
. Note that we probably want it to work per-variant, too, e.g. in the following, only variantA
can be meaningfully serialized as string:
enum Foo {
#[serde(serialize_as_string)]
A,
B(usize),
C{min: usize, max:usize}
}
@erickt any updated thoughts since all of the feedback?
I've started implementing Serialize by hand for some enum types just to work around this behavior.
This issue is complicating things for me in a few different projects. I'd prefer the old default that everyone seems to be asking for, but it would also be a workable replacement if we could use #[serde(serialize_with=$path)]
for enum variants in addition to struct fields.
Does anyone have a workaround for this until the Serde team addresses it? Is the only choice to implement the relevant traits manually?
@jimmycuadra I've been manually implementing the conversion when needed. Here's an example: https://gist.github.com/jwilm/30b62746a8f9dcd910cd470e2836a7cc. You might want to implement the conversion to a &str
with AsRef<str>
instead of a custom function as_str
as in the example.
Since we're doing this in several places, I ended up refactoring the above into a macro relying on a FromStr
implementation for the same type. Here's the macro for anyone interested.
I've just spend some time trying to figure out why my enum was not deserializing (I'm consuming and external json api).
I find the current behaviour especially surprising when trying to deserialize a C-like enum from a json string value.
Hoping to move this forward with some reflection on what's been discussed so far.
Users who have spoken up in this thread suggest that the intuitive behavior of serde in the case of enums with unit-like variants is to serialize them as a String
/usize
and not as a map. @erickt mentioned that Servo actually depends on the new behavior which treats them as a map and suggested introducing annotations to bring back the old behavior. However, @SimonSapin suggested this wasn't true of Servo's serde use.
The annotations would still be useful for hinting at the Serializer/Deserializer implementations for choosing usize vs string, but to require annotations by default has been shown to be counter-intuitive based on the activity in this thread.
It would be great to get some consensus on this decision from the serde team so that someone could make an appropriate PR to implement it. I noticed that some issues are starting to get marked with breakage, and it would be nice to get this resolved for the next breaking release (or earlier depending on the decision here).
As to not leave this too open-ended or hand wavey, here is a concrete proposal.
String
or usize
instead of a map.#[serde(unit_variant_as_usize)]
and #[serde(unit_variant_as_string)]
could hint at serializer/deserializer implementations to choose one over the other. This probably only matters for human consumable serialization formats such as yaml and json since binary formats would almost certainly want to use the more compact usize serialization.Servo isn’t a single person. I wrote above about my experience, but I can’t speak for everyone else on the Servo team.
d8fb2abd032ad1907d269c24d518a9695493a34b is the commit where enum serialization was initially changed to {"variant": ["fields, ...]}
. Unfortunately the message doesn't elaborate on the Servo question.
I also poked around in the Servo script component a bit, and the only references to serde::{Serialize, Deserialize}
traits were for empty JSTrace impls on ipc channel types.
The aforementioned commit is dated Date: Mon Jun 9 07:51:53 2014 -0700; the next step is looking for serde usage around that time, and we can maybe find out who requested this and whether its still relied upon.
The earliest commit I can find in Servo referencing serde are from @pcwalton servo/servo@6eacb0c99586865913c9e92a46a5b5945655bd7d almost 1 year after this change was made in serde.
Aside from serializing display lists, it looks like Servo's serde usage is almost exclusively related to IPC, and it might not care about the actual serialization layout as long as it's symmetric for IPC sender/receiver pairs. The IPC components don't use JSON anymore either, it's all bincode.
None of Servo's usage I can find appears reliant on unit-like enum variant serialization change being discussed here, but that's just what I can see as an outside observer.
There's not a lot of evidence to suggest that this is a real need (or maybe I'm just missing it):
We removed it on the request from the servo team because JavaScript optimizers prefer to have values in an array to have similar shapes
So, at this point, have we reached a consensus to restore the old behavior, and if not, who else should be in agreement before someone can submit a PR?
+1 for restoring the old behavior
+1 from me. I don't see any compelling argument for {"A": []}
.
The behavior from serde_codegen is correct though - it calls serialize_unit_variant. We need to fix serde_json (and serde_yaml and possibly others) to use the simpler format for unit variants. Let's resurrect https://github.com/serde-rs/json/pull/37 and just make sure it is able to deserialize both formats.
Well, would this PR solve the issue for serde_yaml then: dtolnay/serde-yaml#16?
Yes - although above all, I would like to keep serde_yaml consistent with serde_json because JSON is much more mainstream, so I would wait on serde_yaml until there is a solution in serde_json.
This has been fixed in serde_json 0.8.0 and serde_yaml 0.4.0.
Thanks @dtolnay!
I have worked only with serde_json, to I may be missing something that applies to other (de)serializers.
Serialization
Serialization of enums seems to always produce objects, even for trivial constructors
This produces
{"A":[]}
. I would prefer if it produced just the string"A"
.Deserialization
More importantly, deserialization of enums seems to always require objects, which is annoying when one attempts to implement a protocol.
I would have preferred if it managed to deserialize the string
"A"
into enum constructorA
.edit Fixed typo in the second example.