mongodb / bson-rust

Encoding and decoding support for BSON in Rust
MIT License
389 stars 130 forks source link

Serialising and deserialising a BSON document with Regex can change the document #434

Closed aijnn closed 8 months ago

aijnn commented 9 months ago

Versions/Environment

  1. What version of Rust are you using? 1.72.1
  2. What operating system are you using? macOS
  3. What versions of the driver and its dependencies are you using? 2.7.0
  4. What version of MongoDB are you using? N/A
  5. What is your MongoDB topology (standalone, replica set, sharded cluster, serverless)? N/A

Describe the bug

Serialising and then deserialising a BSON document produces a different document if the document contains Regex with unsorted options:

doc != deserialise(serialise(doc))

This can be demonstrated with the below example examples/regex.rs:

use bson::Document;
use std::io::Cursor;

fn main() {
    let bytes1: &[u8] = b"\x0b\x00\x00\x00\x0b\x00\x00ba\x00\x00";

    let doc1 = Document::from_reader(&mut Cursor::new(&bytes1[..])).unwrap();
    let mut bytes2 = Vec::new();
    doc1.to_writer(&mut bytes2).unwrap();
    let doc2 = Document::from_reader(&mut Cursor::new(&bytes2[..])).unwrap();

    println!("Bytes1 ({}): {:?}", bytes1.len(), bytes1);
    println!("Bytes2 ({}): {:?}", bytes2.len(), bytes2);
    println!("Doc1: {:?}", doc1);
    println!("Doc2: {:?}", doc2);
}

Which produces the following output:

# cargo run --example regex
...
Bytes1 (11): [11, 0, 0, 0, 11, 0, 0, 98, 97, 0, 0]
Bytes2 (11): [11, 0, 0, 0, 11, 0, 0, 97, 98, 0, 0]
Doc1: Document({"": Regex { pattern: "", options: "ba" }})
Doc2: Document({"": Regex { pattern: "", options: "ab" }})

Expected behavior

Serialising and then deserialising always produces identical document.

Actual behavior

Serialising and then deserialising a document containing Regex with unsorted options produces a different document.

Why this may be happening

Serialisation sorts Regex options. Note that deserialisation does not seem to sort options.

To reproduce

  1. Start with BSON document: {"": Regex { pattern: "", options: "ba" }}
  2. Serialize, result is: 0x0b0000000b000061620000
  3. Deserialize, result is: {"": Regex { pattern: "", options: "ab" }}

Suggested fix

Either:

  1. Refuse to (de)serialise a BSON document containing a Regex with unsorted options, since such a document is not valid according to the specification
  2. (De)serialise BSON document containing a Regex with unsorted options as is, without any sorting
abr-egn commented 9 months ago

Is this particular round-trip inequality causing a specific issue?

In general, we prefer to err on the side of accepting input where possible and producing spec-conforming output; this kind of situation is a side effect of that.

github-actions[bot] commented 8 months ago

There has not been any recent activity on this ticket, so we are marking it as stale. If we do not hear anything further from you, this issue will be automatically closed in one week.

github-actions[bot] commented 8 months ago

There has not been any recent activity on this ticket, so we are closing it. Thanks for reaching out and please feel free to file a new issue if you have further questions.