Closed RReverser closed 1 year ago
We are tracking a similar idea in #271.
Can you go into more detail about how you would expect validation to work during deserialization?
@dtolnay That one won't help with "add extra field to this one enum variant" case, and generally feels like overkill for simple compile-time constants.
As for validation - just meant reading value as its original type and comparing with defined constant. If they don't match, that's unexpected-value kind of error.
I think I have hit the same usecase with appc pod manifest, which has an internal constant field acKind
to identify its own type.
This may be seen somehow as a degenerate case of an internally tagged enum with a single variant.
Perhaps the tag
container-attribute could be just extended to structs as well, allowing:
#[serde(tag = "acKind", rename="PodManifest")]
pub struct AppcPod {
// All other real fields, but not acKind...
}
Regarding validation, I would just expect this to discriminate based on (possibly renamed) struct name.
I've also seen somewhere else a kind-of similar case of an "adjacently tagged struct" which I think can also be addressed via tag
+rename
+content
on structs:
{
"kind": "foo_type",
"value: { "foo": true }
}
#[serde(tag = "kind", rename="foo_type", content="value")]
pub struct Foo {
foo: bool,
}
I am open to supporting tag
and content
attributes on structs. I have seen a handful of use cases that have come up in IRC where this would have been useful.
I have another use case to add to the thread: https://github.com/peterhuene/azure-functions-rs/blob/master/azure-functions-shared/src/codegen/bindings/queue.rs
I have a bunch of structs that need to serialize with some constant data. In this case, the direction
and type
fields of the resulting JSON are always constant.
I'd love to be able to remove the custom Serialize
implementation in the future, if possible. Cheers!
I have another use case. I'm serializing an AST to JSON for testing purposes.
The tests look like this: https://github.com/projectfluent/fluent/blob/master/test/fixtures/messages.json
As you can see, there's plenty of "type": "NodeName"
all around, but serde only allows for tag
on Enum, which means that Resource, Identifier, Attribute, Pattern and so on in my AST require manual serialization to add them. That's a lot of boilerplate. Would be wonderful to annotate with a tag instead.
@zbraniecki FWIW ASTs are exactly the original use case for this issue :)
I have another use case, then :yum:
Serializing messages over the network, when there is not a single enum
wrapping all messages but the messages can be basically anything serializable, that will be matched with their type by trying to deserialize and assume the type was guessed right if deserialization succeeds :) Having a tag on the struct would most likely drastically reduce the number of false positives.
@Ekleog I believe that's a somewhat different case. You're still going to need to somehow express "one of these structs" in the type system, and that's already covered by an enum + #[serde(untagged)]
.
My expression of “one of these structs” is basically “whatever the user of my lib put inside a macro I provide”, I'm almost completely by-passing the type system here, and the type is basically “potentially whatever that implements a trait I provide” :) (I'm more or less making a match
on the the type to decode)
In order to make this work properly, I need that something encoded as struct Foo { foo: String }
cannot be decoded as struct Bar { foo: String }
, though :)
Wrapping these structs in an enum with a single variant and #{serde(untagged)]
would be a solution to ensure that, but it's not really efficient for users of my lib, and I can't auto-generate struct names from TypeId
because TypeId
isn't stable across Rust releases.
Problem is, just tags themselves are not enough for serde to know the destination types. It needs to statically know all the possible properties and types as it will be trying to choose how to decode based on their shape. You can't just say "decode anything" and then somehow cast to one of the structs in your library, even if they're tagged.
I know, that's why I say it's best-effort matching the struct :)
To be more precise, here is an example use case:
match_deserialize!(some_string, {
Foo: x => foo(x),
Bar: x => bar(x),
})
That would expand to (pseudo-code, I haven't checked the correct functions):
match some_string.try_deserialize::<Foo>() {
Ok(x) => foo(x),
Err(_) => match some_string.try_deserialize::<Bar>() {
Ok(x) => bar(x),
Err(_) => panic!("Unexpected string"),
}
}
With such a setup there is no need for an enum, and it would be very inconvenient to have to prepare an enum of all types that could ever possibly be given to try_deserialize
. (actually, in real code, I'm not panicking if nothing matches, but waiting for the next string from the network to try again, and keeping this failed string for later usage)
Hence my use case for adding a tag based on the struct's name, to reduce false positives (if, eg., in the example above, Foo
and Bar
had identically-named fields, then even if some_string
was a serialized Bar
, it would be parsed as a Foo
)
@zbraniecki FWIW ASTs are exactly the original use case for this issue :)
Gotcha!
Just to verify - is there any workaround that I could use as of today, or do I need to add a manual serialization for all of my AST node structs? :)
With such a setup there is no need for an enum, and it would be very inconvenient to have to prepare an enum of all types
Why? You're already using a macro with possible branches, you can just expand to an enum, let serde do its job, and match based on the result.
@Ekleog So what you want should be achievable with something like this: https://play.rust-lang.org/?gist=c7b55508907b4bc482468b44d9a8bd73&version=stable&mode=debug&edition=2015
But yeah, I can see how tags on structures would be helpful in this case too.
do I need to add a manual serialization for all of my AST node structs? :)
@zbraniecki No need to do manual serialisation, just put your structs into an enum. This seems similar to what @Ekleog has been asking, and, unlike extra non-tag fields, doesn't necessarily require any changes in serde.
@RReverser The issue with tagging is not at deserialization time (where there could be enums indeed), but at serialization time, because serialization is done with a function similar to do_serialize<T: MyTrait>(t: T)
. Here, there is no macro that could help. :) Oh, and your solution doesn't work when trying to deserialize into a generic struct, because it's not a simple :ident
.
Anyway, this is getting pretty off-topic, so let's continue the discussion somewhere else (eg. #serde
, I'm ekleog
there too) :)
@zbraniecki No need to do manual serialisation, just put your structs into an enum. This seems similar to what @Ekleog has been asking, and, unlike extra non-tag fields, doesn't necessarily require any changes in serde.
Hmm, I'm not sure what you mean.
I have a lot of cases like this: https://play.rust-lang.org/?gist=230561d4086ff0f8fc54d1ae4ea2bc09&version=stable&mode=debug&edition=2015
I don't control the Resource
, but I do control the ResourceDef
. How should I turn it into an enum
to add the type
to the output?
Woah, I pretty much accidentally found out that #1448 implemented tags on structs (somehow missed before).
The only thing I don't like is that tag
is ignored during deserialisation, whereas above I suggested that it has to be used for validation, but other than that it feels very close to what I hoped for.
@RReverser I actually proposed that it in #1475, but it was rejected.
@jwillbold Hmm, maybe because the main focus was on untagged enums, while validating tags on structs would be good (and more in line with other serde(tag)
behaviour) regardless of where they're used.
@dtolnay what do you think?
For validation of tags on structs I would prefer to see it implemented as an attribute macro:
#[validate_tag]
#[derive(Deserialize)]
struct Program {
body: Vec<Stmt>,
}
// expands to:
#[derive(Deserialize)]
#[serde(tag = "type", remote = "Self")]
struct Program {
body: Vec<Stmt>,
}
impl<'de> Deserialize<'de> for Program {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
Program::deserialize(Validate {
deserializer,
key: "type",
value: "Program",
})
}
}
@dtolnay Sure, anything can be implemented outside of serde core, but things like this one feel a bit inconsistent and more like a bug. I'd expect #[serde(tag = ...)] struct S { .. }
to behave in the same way as #[serde(tag = ...)] enum S { /* single variant */ S { ... } }
.
This is especially bad with empty structs btw, where {"type":"A"}
can silently deserialise into struct B {}
and vice versa, which can lead to easily missed mistakes in AST.
A possible workaround to have constant "tags" on structs without a custom deserializer is to use a single-variant enum with the desired name..
#[derive(Deserialize)]
pub enum ConstFoo {
#[serde(rename = "foo")] Foo
}
#[derive(Deserialize)]
pub struct MyStruct {
tag_field: ConstFoo,
other_field: ...
}
Has there been any progress on having constant 'tag' fields in structs other than using the (quite convoluted) 'single enum variant' trick?
I am currently running into the same situation, where one field of my struct is constant and always needs to match a certain character sequence to deserialize properly. Using #[tag/rename]
on a struct does the correct thing on serializing, but not while deserializing.
You can use tags on structs, I implemented in #1448. Maybe it never made its way in the docs?
No, it didn't: https://github.com/serde-rs/serde-rs.github.io/pull/103
As far as I am aware, it still does not respect deserialization: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=af10b2cd5bf538d58f91bde2bd680b58
You can use https://crates.io/crates/monostate for constant data. The only downside is that you still need to include it as a field of your struct.
Yes, it does not support deserialization, I proposed it in #1475, it was rejected
I think with #1448, one of the more common use cases for this request has been addressed. I know there is a long tail of extremely diverse remaining cases that would probably involve a dozen or more new serde_derive attributes to accommodate, but I think the line is drawn in the correct place for now and I would prefer to leave the rest to be dealt with in handwritten Serialize impls. Alternatively if someone wished to pursue making a more fully featured derive macro I'm sure that would be welcomed.
Serde allows to use
tag = "..."
on enums, but I have a format in which there are also "tags" on structs - basically extra constant fields that must be present in serialized form to pass validations, but as Rust is typed, they don't matter in Rust representation.For the sake of example:
This must be serialized as
{ "type": "Program", "body": [...] }
.Another example, as part of enum:
Of course, I can add extra meaningless
type: &'static str
toProgram
andvalue: ()
toNullLiteral
or implement customSerialize
/serialize_with
for each of these cases, but as it seems to be a quite common repetitive task, would it make sense to add support for constant fields in Serde itself via a separate attribute?For deserialize implementation, those fields would be just checked for existence (like
tag
on enums), and for serialize they would be implicitly added.