mongodb / bson-rust

Encoding and decoding support for BSON in Rust
MIT License
400 stars 132 forks source link

RUST-426 Fix Decimal128 human-readable serialization #479

Closed abr-egn closed 3 months ago

abr-egn commented 3 months ago

RUST-426

This is a bugfix tangential to working on RUST-426; it conveniently showcases the complexity of issues the current state of the codebase can cause.

It turns out that in the corpus tests, along with the spec-mandated comparisons, we also assert that various Rust-specific conversion identities hold. The one failing in this case was:

let bytes: Vec[u8] = ...;  // from corpus
let inner = Document::from_reader(bytes);
let outer = bson::to_document(inner);
let mut new_bytes = Vec::new();
outer.to_writer(&mut new_bytes);
assert_eq!(bytes, new_bytes);

The bytes in the corpus deserialized to a doc with a Decimal128 value of -NaN. The failure occurs when calling bson::to_document on a value that is itself Document:

  1. bson::to_document flags serialization as human-readable.
  2. Serializing Decimal128 to a human-readable destination causes it to emit an extjson representation (like the rest of our Bson types)`
  3. The spec for bson decimal128 support gives the string representation of -NaN as just NaN, so the extjson here is { $numberDecimal: NaN }
  4. In order to round-trip types, bson::to_document parses extjson representations in the serialized values, so it gets back a decimal128 that's non-negative.
  5. That doesn't produce the same serialized bytes as the original πŸ™ƒ

More generally, what this means is that these three properties cannot all be true at once:

I believe the correct property to drop is the last one; the corpus already flags some of the tests (including the failing one here) as lossy, so there's clearly an acknowledgement there that extjson cannot always precisely correspond to the binary form.

We could work around that if there were some way for a serializer to flag to a data type that it should use special logic, but I can't find any way to do that in Serde 😞 There are various hacks to provide side-channel data in other directions:

... but I can't find any way for serialization format to change based on a serializer beyond the one-bit is_human_readable flag.

abr-egn commented 3 months ago

More generally, what this means is that these three properties cannot all be true at once:

  • The human-readable serialized form of bson types is extjson.
  • bson::to_document is human-readable.
  • bson::to_document(Document) is an identity function.

I believe the correct property to drop is the last one; the corpus already flags some of the tests (including the failing one here) as lossy, so there's clearly an acknowledgement there that extjson cannot always precisely correspond to the binary form.

There's also a reasonable argument to be made that the right property to drop here is the first one: if you want extjson, you should use the explicit Bson::into_[relaxed|canonical]_extjson and that we don't guarantee anything about the Serde data model representation of the data types. That would allow a lot of simplification in the code but changing the serialized format, even in a backwards-compatible way, makes me nervous.

abr-egn commented 3 months ago

Yeah, I think we should consider a 3.0 of this library if we plan to change serialized formats. IMO significantly changing the outputs of our serialization methods would be more disruptive than API changes since users wouldn't have obvious compiler errors to fix when upgrading. The timing is poor for this though since we just 3.0-ed the driver :/

We're definitely approaching a point where there's a reasonable amount of things in the bson crate that require a 3.0 release to improve. I am reasonably confident that we can use the future-proofing feature setup in the driver crate to introduce a bson 3.0 without having to go to 4.0 in the mongodb crate, at the cost of new users defaulting to bson 2.x and having to maintain some compatibility shims internally.

I filed https://jira.mongodb.org/browse/RUST-1985 to track this for bson-3.0.