mongodb / bson-rust

Encoding and decoding support for BSON in Rust
MIT License
399 stars 132 forks source link

document has bytes remaining that were not visited #481

Closed LinusU closed 2 months ago

LinusU commented 3 months ago

Versions/Environment

  1. What version of Rust are you using? 1.78.0
  2. What operating system are you using? Amazon Linux 2
  3. What versions of the driver and its dependencies are you using?
    • registry+https://github.com/rust-lang/crates.io-index#mongodb@3.0.0
    • registry+https://github.com/rust-lang/crates.io-index#bson@2.11.0
  4. What version of MongoDB are you using? 5.0.26
  5. What is your MongoDB topology (standalone, replica set, sharded cluster, serverless)? replica set

Describe the bug

For one specific document in our database, we are getting the following error:

document has bytes remaining that were not visited: 2344

I have tried to dump the document as BSON from the database, but I can read that just fine. We are however using a projection, so my working theory is that our database server is giving some invalid bson for that specific projection on that specific query 🤔

We have ~6.8 million documents that works, and one that doesn't 😅

We also tried removing the document, and inserting it again, but the same error is still happening when we query it out with a projection.

I would love to get some more help on how to debug this! e.g. how I can dump the exact BSON that's being parsed.

BE SPECIFIC:

To Reproduce Unfortunately, I haven't figure out how to reproduce this at this moment.

isabelatkinson commented 3 months ago

Hey @LinusU, thanks for opening this issue! To clarify my understanding, are the following correct?

To determine whether the issue is coming from the driver specifically, can you run the query with the projection that's giving you problems in mongosh? If that succeeds, then there's likely a bug in our deserialization logic; the error you're receiving comes from this line in the Rust BSON library. If that's the case, whatever information you can provide for us about the projected document would help in diagnosing what's going wrong on our end.

isabelatkinson commented 3 months ago

One more question: what is the generic type of the collection you're using for the query? Does the behavior change when using Collection<Document> or Collection<RawDocumentBuf>?

github-actions[bot] commented 2 months ago

There has not been any recent activity on this ticket, so we are marking it as stale. If we do not hear anything further from you, this issue will be automatically closed in one week.

github-actions[bot] commented 2 months ago

There has not been any recent activity on this ticket, so we are closing it. Thanks for reaching out and please feel free to file a new issue if you have further questions.

LinusU commented 2 months ago

@isabelatkinson so sorry for the late reply on this, unfortunately it happened during the start of my vacation.

This has now happened to another document.

It seems like the projection was a red herring, after more investigation I've concluded that this happens with or without the projection. I've managed to trim it down to the code below:

#[tokio::main]
async fn main() -> Result<(), lambda_runtime::Error> {
    let client = Client::with_uri_str("mongodb+srv://......").await?;
    let db = client.default_database().unwrap();

    let mut cursor = db
        .collection::<Order>("Order")
        .find(doc! { "_id": "xyz123" })
        .await?;

    while cursor.advance().await? {
        println!("Error is on the line below");
        cursor.deserialize_current()?;
        println!("Error is on the line above");
    }

    Ok(())
}

This code gives the following output:

Error is on the line below
Error: Error { kind: BsonDeserialization(DeserializationError { message: "document has bytes remaining that were not visited: 2109" }), labels: {}, wire_version: None, source: None }

Order is a custom struct using Serde deserialization traits. Using Document or RawDocumentBuf works without an error.

With this information I was able to further narrow it down to the following line:

  #[serde(borrow)]
  pub refund_receipts: Option<[OrderReceipt<'a>; 1]>,

This code wasn't built to handle anything other than 0 or 1 refund receipts, but the document in question had two. So in a sense this was user error, just a bit hard to understand and track down 😅


I think that it would be awesome if the error message could be updated, but otherwise I'm able to work around this myself now. Thanks for your help!

isabelatkinson commented 2 months ago

Thanks for the additional information! Agreed that this is not a useful error message; I filed RUST-2007 to investigate how to improve it.

LinusU commented 2 months ago

Thanks! 🙏