serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
9.16k stars 774 forks source link

Serialize extra constant fields #760

Closed RReverser closed 1 year ago

RReverser commented 7 years ago

Serde allows to use tag = "..." on enums, but I have a format in which there are also "tags" on structs - basically extra constant fields that must be present in serialized form to pass validations, but as Rust is typed, they don't matter in Rust representation.

For the sake of example:

pub struct Program {
  // type: "Program"
  pub body: Vec<Stmt>
}

This must be serialized as { "type": "Program", "body": [...] }.

Another example, as part of enum:

#[serde(tag = "type")]
pub enum Expr {
  ...
  #[serde(rename = "Literal")]
  NullLiteral // must be serialized as `{ "type": "Literal", "value": null }` not as `{ "type": "Literal" }`
}

Of course, I can add extra meaningless type: &'static str to Program and value: () to NullLiteral or implement custom Serialize / serialize_with for each of these cases, but as it seems to be a quite common repetitive task, would it make sense to add support for constant fields in Serde itself via a separate attribute?

For deserialize implementation, those fields would be just checked for existence (like tag on enums), and for serialize they would be implicitly added.

dtolnay commented 7 years ago

We are tracking a similar idea in #271.

Can you go into more detail about how you would expect validation to work during deserialization?

RReverser commented 7 years ago

@dtolnay That one won't help with "add extra field to this one enum variant" case, and generally feels like overkill for simple compile-time constants.

As for validation - just meant reading value as its original type and comparing with defined constant. If they don't match, that's unexpected-value kind of error.

lucab commented 7 years ago

I think I have hit the same usecase with appc pod manifest, which has an internal constant field acKind to identify its own type.

This may be seen somehow as a degenerate case of an internally tagged enum with a single variant.

Perhaps the tag container-attribute could be just extended to structs as well, allowing:

#[serde(tag = "acKind", rename="PodManifest")]
pub struct AppcPod {
  // All other real fields, but not acKind...
}

Regarding validation, I would just expect this to discriminate based on (possibly renamed) struct name.

I've also seen somewhere else a kind-of similar case of an "adjacently tagged struct" which I think can also be addressed via tag+rename+content on structs:

{
  "kind": "foo_type",
  "value: { "foo": true }
}

#[serde(tag = "kind", rename="foo_type", content="value")]
pub struct Foo {
  foo: bool,
}
dtolnay commented 7 years ago

I am open to supporting tag and content attributes on structs. I have seen a handful of use cases that have come up in IRC where this would have been useful.

peterhuene commented 6 years ago

I have another use case to add to the thread: https://github.com/peterhuene/azure-functions-rs/blob/master/azure-functions-shared/src/codegen/bindings/queue.rs

I have a bunch of structs that need to serialize with some constant data. In this case, the direction and type fields of the resulting JSON are always constant.

I'd love to be able to remove the custom Serialize implementation in the future, if possible. Cheers!

zbraniecki commented 6 years ago

I have another use case. I'm serializing an AST to JSON for testing purposes.

The tests look like this: https://github.com/projectfluent/fluent/blob/master/test/fixtures/messages.json

As you can see, there's plenty of "type": "NodeName" all around, but serde only allows for tag on Enum, which means that Resource, Identifier, Attribute, Pattern and so on in my AST require manual serialization to add them. That's a lot of boilerplate. Would be wonderful to annotate with a tag instead.

RReverser commented 6 years ago

@zbraniecki FWIW ASTs are exactly the original use case for this issue :)

Ekleog commented 6 years ago

I have another use case, then :yum:

Serializing messages over the network, when there is not a single enum wrapping all messages but the messages can be basically anything serializable, that will be matched with their type by trying to deserialize and assume the type was guessed right if deserialization succeeds :) Having a tag on the struct would most likely drastically reduce the number of false positives.

RReverser commented 6 years ago

@Ekleog I believe that's a somewhat different case. You're still going to need to somehow express "one of these structs" in the type system, and that's already covered by an enum + #[serde(untagged)].

Ekleog commented 6 years ago

My expression of “one of these structs” is basically “whatever the user of my lib put inside a macro I provide”, I'm almost completely by-passing the type system here, and the type is basically “potentially whatever that implements a trait I provide” :) (I'm more or less making a match on the the type to decode)

In order to make this work properly, I need that something encoded as struct Foo { foo: String } cannot be decoded as struct Bar { foo: String }, though :)

Wrapping these structs in an enum with a single variant and #{serde(untagged)] would be a solution to ensure that, but it's not really efficient for users of my lib, and I can't auto-generate struct names from TypeId because TypeId isn't stable across Rust releases.

RReverser commented 6 years ago

Problem is, just tags themselves are not enough for serde to know the destination types. It needs to statically know all the possible properties and types as it will be trying to choose how to decode based on their shape. You can't just say "decode anything" and then somehow cast to one of the structs in your library, even if they're tagged.

Ekleog commented 6 years ago

I know, that's why I say it's best-effort matching the struct :)

To be more precise, here is an example use case:

match_deserialize!(some_string, {
    Foo: x => foo(x),
    Bar: x => bar(x),
})

That would expand to (pseudo-code, I haven't checked the correct functions):

match some_string.try_deserialize::<Foo>() {
    Ok(x) => foo(x),
    Err(_) => match some_string.try_deserialize::<Bar>() {
        Ok(x) => bar(x),
        Err(_) => panic!("Unexpected string"),
    }
}

With such a setup there is no need for an enum, and it would be very inconvenient to have to prepare an enum of all types that could ever possibly be given to try_deserialize. (actually, in real code, I'm not panicking if nothing matches, but waiting for the next string from the network to try again, and keeping this failed string for later usage)

Hence my use case for adding a tag based on the struct's name, to reduce false positives (if, eg., in the example above, Foo and Bar had identically-named fields, then even if some_string was a serialized Bar, it would be parsed as a Foo)

zbraniecki commented 6 years ago

@zbraniecki FWIW ASTs are exactly the original use case for this issue :)

Gotcha!

Just to verify - is there any workaround that I could use as of today, or do I need to add a manual serialization for all of my AST node structs? :)

RReverser commented 6 years ago

With such a setup there is no need for an enum, and it would be very inconvenient to have to prepare an enum of all types

Why? You're already using a macro with possible branches, you can just expand to an enum, let serde do its job, and match based on the result.

RReverser commented 6 years ago

@Ekleog So what you want should be achievable with something like this: https://play.rust-lang.org/?gist=c7b55508907b4bc482468b44d9a8bd73&version=stable&mode=debug&edition=2015

But yeah, I can see how tags on structures would be helpful in this case too.

RReverser commented 6 years ago

do I need to add a manual serialization for all of my AST node structs? :)

@zbraniecki No need to do manual serialisation, just put your structs into an enum. This seems similar to what @Ekleog has been asking, and, unlike extra non-tag fields, doesn't necessarily require any changes in serde.

Ekleog commented 6 years ago

@RReverser The issue with tagging is not at deserialization time (where there could be enums indeed), but at serialization time, because serialization is done with a function similar to do_serialize<T: MyTrait>(t: T). Here, there is no macro that could help. :) Oh, and your solution doesn't work when trying to deserialize into a generic struct, because it's not a simple :ident.

Anyway, this is getting pretty off-topic, so let's continue the discussion somewhere else (eg. #serde, I'm ekleog there too) :)

zbraniecki commented 6 years ago

@zbraniecki No need to do manual serialisation, just put your structs into an enum. This seems similar to what @Ekleog has been asking, and, unlike extra non-tag fields, doesn't necessarily require any changes in serde.

Hmm, I'm not sure what you mean.

I have a lot of cases like this: https://play.rust-lang.org/?gist=230561d4086ff0f8fc54d1ae4ea2bc09&version=stable&mode=debug&edition=2015

I don't control the Resource, but I do control the ResourceDef. How should I turn it into an enum to add the type to the output?

RReverser commented 5 years ago

Woah, I pretty much accidentally found out that #1448 implemented tags on structs (somehow missed before).

The only thing I don't like is that tag is ignored during deserialisation, whereas above I suggested that it has to be used for validation, but other than that it feels very close to what I hoped for.

jwillbold commented 5 years ago

@RReverser I actually proposed that it in #1475, but it was rejected.

RReverser commented 5 years ago

@jwillbold Hmm, maybe because the main focus was on untagged enums, while validating tags on structs would be good (and more in line with other serde(tag) behaviour) regardless of where they're used.

@dtolnay what do you think?

dtolnay commented 5 years ago

For validation of tags on structs I would prefer to see it implemented as an attribute macro:

#[validate_tag]
#[derive(Deserialize)]
struct Program {
    body: Vec<Stmt>,
}
// expands to:

#[derive(Deserialize)]
#[serde(tag = "type", remote = "Self")]
struct Program {
    body: Vec<Stmt>,
}

impl<'de> Deserialize<'de> for Program {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        Program::deserialize(Validate {
            deserializer,
            key: "type",
            value: "Program",
        })
    }
}
RReverser commented 5 years ago

@dtolnay Sure, anything can be implemented outside of serde core, but things like this one feel a bit inconsistent and more like a bug. I'd expect #[serde(tag = ...)] struct S { .. } to behave in the same way as #[serde(tag = ...)] enum S { /* single variant */ S { ... } }.

RReverser commented 5 years ago

This is especially bad with empty structs btw, where {"type":"A"} can silently deserialise into struct B {} and vice versa, which can lead to easily missed mistakes in AST.

njam commented 5 years ago

A possible workaround to have constant "tags" on structs without a custom deserializer is to use a single-variant enum with the desired name..

#[derive(Deserialize)]
pub enum ConstFoo {
    #[serde(rename = "foo")] Foo
}

#[derive(Deserialize)]
pub struct MyStruct {
    tag_field: ConstFoo,
    other_field: ...
}
TheNeikos commented 1 year ago

Has there been any progress on having constant 'tag' fields in structs other than using the (quite convoluted) 'single enum variant' trick?

I am currently running into the same situation, where one field of my struct is constant and always needs to match a certain character sequence to deserialize properly. Using #[tag/rename] on a struct does the correct thing on serializing, but not while deserializing.

jwillbold commented 1 year ago

You can use tags on structs, I implemented in #1448. Maybe it never made its way in the docs?

jplatte commented 1 year ago

No, it didn't: https://github.com/serde-rs/serde-rs.github.io/pull/103

TheNeikos commented 1 year ago

As far as I am aware, it still does not respect deserialization: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=af10b2cd5bf538d58f91bde2bd680b58

jonasbb commented 1 year ago

You can use https://crates.io/crates/monostate for constant data. The only downside is that you still need to include it as a field of your struct.

jwillbold commented 1 year ago

Yes, it does not support deserialization, I proposed it in #1475, it was rejected

dtolnay commented 1 year ago

I think with #1448, one of the more common use cases for this request has been addressed. I know there is a long tail of extremely diverse remaining cases that would probably involve a dozen or more new serde_derive attributes to accommodate, but I think the line is drawn in the correct place for now and I would prefer to leave the rest to be dealt with in handwritten Serialize impls. Alternatively if someone wished to pursue making a more fully featured derive macro I'm sure that would be welcomed.