Closed abr-egn closed 5 months ago
More generally, what this means is that these three properties cannot all be true at once:
- The human-readable serialized form of bson types is extjson.
bson::to_document
is human-readable.bson::to_document(Document)
is an identity function.I believe the correct property to drop is the last one; the corpus already flags some of the tests (including the failing one here) as lossy, so there's clearly an acknowledgement there that extjson cannot always precisely correspond to the binary form.
There's also a reasonable argument to be made that the right property to drop here is the first one: if you want extjson, you should use the explicit Bson::into_[relaxed|canonical]_extjson
and that we don't guarantee anything about the Serde data model representation of the data types. That would allow a lot of simplification in the code but changing the serialized format, even in a backwards-compatible way, makes me nervous.
Yeah, I think we should consider a 3.0 of this library if we plan to change serialized formats. IMO significantly changing the outputs of our serialization methods would be more disruptive than API changes since users wouldn't have obvious compiler errors to fix when upgrading. The timing is poor for this though since we just 3.0-ed the driver :/
We're definitely approaching a point where there's a reasonable amount of things in the bson crate that require a 3.0 release to improve. I am reasonably confident that we can use the future-proofing feature setup in the driver crate to introduce a bson
3.0 without having to go to 4.0 in the mongodb
crate, at the cost of new users defaulting to bson
2.x and having to maintain some compatibility shims internally.
I filed https://jira.mongodb.org/browse/RUST-1985 to track this for bson-3.0
.
RUST-426
This is a bugfix tangential to working on RUST-426; it conveniently showcases the complexity of issues the current state of the codebase can cause.
Decimal128
had both inline serialization code inBson::serialize
and its ownSerialize
implementation.Decimal128::serialize
uses the proper extjson representation when the serializer is human-readable, whereasBson::serialize
would always use a non-human-readable format (a holdover from before we had proper support for Decimal128 values).Decimal128::serialize
caused a corpus test to start failing.It turns out that in the corpus tests, along with the spec-mandated comparisons, we also assert that various Rust-specific conversion identities hold. The one failing in this case was:
The bytes in the corpus deserialized to a doc with a
Decimal128
value of-NaN
. The failure occurs when callingbson::to_document
on a value that is itselfDocument
:bson::to_document
flags serialization as human-readable.Decimal128
to a human-readable destination causes it to emit an extjson representation (like the rest of ourBson
types)`-NaN
as justNaN
, so the extjson here is{ $numberDecimal: NaN }
bson::to_document
parses extjson representations in the serialized values, so it gets back a decimal128 that's non-negative.More generally, what this means is that these three properties cannot all be true at once:
bson::to_document
is human-readable.bson::to_document(Document)
is an identity function.I believe the correct property to drop is the last one; the corpus already flags some of the tests (including the failing one here) as lossy, so there's clearly an acknowledgement there that extjson cannot always precisely correspond to the binary form.
We could work around that if there were some way for a serializer to flag to a data type that it should use special logic, but I can't find any way to do that in Serde π There are various hacks to provide side-channel data in other directions:
Visitor
impl based on that.... but I can't find any way for serialization format to change based on a serializer beyond the one-bit
is_human_readable
flag.