Closed tyilo closed 2 months ago
Also consider making this the default to improve interoperability with other drivers.
Thanks for the contribution! I don't think we'll want to make this the default to avoid breaking anyone relying on the existing validation, but it seems quite helpful as an opt-in feature.
If you're willing, could you share more information about the situation that motivates this? The bson spec says that strings are UTF-8, so at least in theory drivers shouldn't be writing values that require this.
If you're willing, could you share more information about the situation that motivates this? The bson spec says that strings are UTF-8, so at least in theory drivers shouldn't be writing values that require this.
Yes, I agree that in a perfect world this would be rejected by all drivers and also the MongoDB server itself.
However, at least the Java driver doesn't always produce valid UTF-8 when writing to a collection. See https://jira.mongodb.org/projects/JAVA/issues/JAVA-5575 which has been closed as "Won't Fix".
Hi @tyilo, apologies for the delay here! UTF-8 lossy deserialization is a concept that we'd like to keep contained to the bson
library if possible. Would the wrapper type added in https://github.com/mongodb/bson-rust/pull/497 work for you rather than exposing this feature flag?
Here's the basic idea of what using this type would look like:
struct HasLossyString {
// this string might have invalid utf-8
s: Utf8LossyDeserialization<String>,
}
let collection: Collection<HasLossyString> = client.database("db").collection("coll");
// documents with invalid utf-8 strings in field s will not error
let cursor = collection.find(filter).await?;
// both s1 and s2 might have invalid utf-8
struct HasLossyStrings {
s1: String,
s2: String,
}
let collection: Collection<Utf8LossyDeserialization<HasLossyStrings>> = client.database("db").collection("coll");
// documents with any invalid utf-8 string values will not error
let cursor = collection.find(filter).await?;
You could also use this type with Document
/RawDocumentBuf
.
@isabelatkinson Seems to work great.
@tyilo great to hear. I just merged in the addition to the BSON library, and it will be included in that crate's next release. Going to close this out - thanks for bringing this issue to our attention!
Useful if you need to read from a collection created by a driver for another programming language.
See #799