serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
9.03k stars 764 forks source link

[Feature Requst] Enforce sorted fields in serialization and/or deserialization #2824

Open oxalica opened 5 days ago

oxalica commented 5 days ago

Currently serialization is done in field definition order, and deserialization accepts arbitrary order. But in some canonical serialization format like RFC8785/JCS, fields need to be serialized in a canonical order, typically alphabetically sorted by the serialized name, so it is immune to refactoring. It's also mentioned in #2368 for canonical CBOR.

This could be achieved by an external Serializer but with the cost of caching the serialization of each fields, sort them, and copying back, which is the current impl of serde_jcs. But the an alternative, changing the order in impl Serialize impl is 0-cost, and does not break deserialization if they accepts arbitrary order.

For deserialization, IIUC, there is now no way to enforce a different field order other than the struct definition order. Validating a canonical serialization can only be done by re-serializing and comparing the result. It's also mentioned in https://github.com/serde-rs/serde/pull/2250#issuecomment-1196098237 that enforcing a fixed deserialization order can also simplify the generated code, which is good for both compilation and binary size.

I'm proposing adding these container attributes:

  1. serde(serialize_sorted): use serialization order of alphabetically sorted serialized names (after renaming), instead of the field definition order. It can be applied for named structs and all four kind of tagged enums. For tuple structs and newtype structs, it's a no-op. It's incompatible with any non-skipped serde(flatten) fields.
  2. serde(deserialize_sorted): enforce field order in deserialization. It's compatibility is the same as serde(serialize_sorted), and is additionally compatible with serde(deny_unknown_fields).
  3. serde(sorted) as an alias for enabling both two above.

(1) is quite easy to implement for serde_derive, a sort on Fields before expand is mostly enough, plus a bit additional handling on tag. (2) may need more work on the new generated code.

If these features are acceptable, I'm happy to draft a PR for it.

Unsolved question: do we also need an attribute to enforce deserialization field order as the definition order?

oli-obk commented 5 days ago

Isn't this useless as a shallow attribute that only enforces the order for the struct you are serializing, but not the structs in the fields?

In that case you could also just have a further custom derive that doesn't do anything but error if the field order is wrong in the struct decl

Mingun commented 5 days ago

In that case you could also just have a further custom derive that doesn't do anything but error if the field order is wrong in the struct decl

I think, that not an option if taking into account the goals (reduce compile time and size by removing code for handling arbitrary order)

oli-obk commented 4 days ago

There's two separate topics here, sorting and enforced order. If we address them, we should do so independently from each other

Imo once you reach this level of special casing types to a serializer, you should not be using a general purpose serializer like serde anymore. Instead write a derive that is directly tuned to your format instead of being generic over the format