Closed xxated closed 2 years ago
Here's another set of benchmarks that might be helpful.
Here's another set of benchmarks that might be helpful.
Nice! But I don't see BSON tests here, am I blind?😂
I can add them for comparison, the benchmark suggestion was mostly to evaluate other formats.
I can add them for comparison, the benchmark suggestion was mostly to evaluate other formats.
If you can that would be extremely helpful!
The benchmarks have been updated with numbers for bson
. It's using a modified version of to_vec
that avoids reallocations (PR) for a fair comparison.
BSON is the format that MongoDB uses both for data storage and to communicate with drivers, so it won't be possible to change the driver to use another format. You can greatly speed up driver performance by utilizing a T
that isn't Document
in your collections though (this isn't reflected in the NoProto benchmarks).
e.g.
#[derive(Deserialize, Serialize, Debug)]
struct MyType { /* fields here */ }
let coll = db.collection::<MyType>("my_coll");
coll.insert_one(MyType::new(...), None).await?;
let mt: MyType = coll.find_one(doc! {}, None).await?.unwrap();
Also, we're currently working on introducing a number of raw-BSON wrapper types, borrowing a lot of code from the rawbson
crate. Once that's done, you'll be able to perform borrowed deserialization, which will be even faster:
#[derive(Debug, Deserialize, Serialize)]
struct MyTypeRef<'a> {
some_borrowed_field: &'a str,
}
let coll = db.collection::<RawDocumentBuf>("my_coll");
coll.insert_one(bson::to_raw_document_buf(MyType::new(...))?, None).await?;
let rawdoc: RawDocumentBuf = coll.find_one(doc! {}, None).await?.unwrap();
let mt: MyTypeRef = bson::from_slice(rawdoc.as_bytes())?;
BSON won't ever reach the speeds of NoProto or some of these other high performance serialization formats due to its dynamic / self-describing nature, but for most driver use-cases this won't really matter though, since the majority of the driver's execution time will be spent on network I/O between the driver and the server, with (de)serialization being negligible in comparison. That being said, we're always striving to improve the performance of bson
, so if you have any specific workloads that seem slower than they ought to be, we'd love to hear about them!
Thank you very much Patrick! Network I/O will not be a bottleneck since we're hosting the server and MongoDB with a cloud Kubernetes provider in the same datacenter, so for us the bottleneck is the driver.
I've got a few further questions: 1) Just to confirm, the performance boost in the first example comes from serializing the struct directly into BSON instead of constructing a Document first? 2) Would using a struct for find_one filter boost performance as well? 3) Are there plans to use bson::to_raw_document_buf implicitly everytime a struct is received (mostly to avoid repetitive code), or is there a breaking change in raw BSON that would prevent that? 4) Will it be possible to convert arrays/Iterables of structs into raw BSON directly?
Just to confirm, the performance boost in the first example comes from serializing the struct directly into BSON instead of constructing a Document first?
Yep, and also it deserializes it directly from BSON without having to go through Document
in read operations. For queries that return a lot of results, this can make a big difference.
Would using a struct for find_one filter boost performance as well?
Our API currently doesn't support doing this, but I wouldn't think so unless the filter was really huge.
Are there plans to use bson::to_raw_document_buf implicitly everytime a struct is received (mostly to avoid repetitive code), or is there a breaking change in raw BSON that would prevent that?
I invoked that explicitly in that example so that I could use a single Collection
for both writes and reads, since Collection
is generic over a single type that's used for both. You can have the driver automatically invoke to_raw_document_buf
for you though (and it's potentially faster to do this) by using a Collection<MyType>
for inserts and a Collection<RawDocumentBuf>
for borrowed deserialization in reads.
Will it be possible to convert arrays/Iterables of structs into raw BSON directly?
Yep, and you can actually do this today via bson::to_vec
if the array/iterable is not part of the top level, since the only top-level BSON type is a document (unlike JSON, which allows arrays, integers, strings, etc at the top level).
#[derive(Debug, Serialize)]
struct MyData {
strings: Vec<String>,
}
let md = MyData { strings: vec!["a".to_string()] };
let raw_bson = bson::to_vec(&md)?; // vec of BSON bytes whose "strings" field is an array
Once the raw BSON work is done, you'll be able to serialize to a RawBson::Array(RawArrayBuf)
value even at the top level, but again this value is only useful as a field in something else if its meant to eventually be inserted in the database.
If you're talking about directly serializing iterables of structs for the purposes of inserting them, you can also do that today via Collection::insert_many
:
let collection = db.collection::<MyType>("my_coll");
collection.insert_many(vec![
MyType::new(),
MyType::new(),
...
], None).await?;
This is a lot faster than calling Collection::insert_one
in a loop as it uses a lot fewer round trips to the database (usually just a single one).
Just to confirm, the performance boost in the first example comes from serializing the struct directly into BSON instead of constructing a Document first?
Yep, and also it deserializes it directly from BSON without having to go through
Document
in read operations. For queries that return a lot of results, this can make a big difference.Would using a struct for find_one filter boost performance as well?
Our API currently doesn't support doing this, but I wouldn't think so unless the filter was really huge.
Are there plans to use bson::to_raw_document_buf implicitly everytime a struct is received (mostly to avoid repetitive code), or is there a breaking change in raw BSON that would prevent that?
I invoked that explicitly in that example was so that I could use a single
Collection
for both writes and reads, sinceCollection
is generic over a single type that's used for both. You can have the driver automatically invoketo_raw_document_buf
for you though (and it's potentially faster to do this) by using aCollection<MyType>
for inserts and aCollection<RawDocumentBuf>
for borrowed deserialization in reads.Will it be possible to convert arrays/Iterables of structs into raw BSON directly?
Yep, and you can actually do this today via
bson::to_vec
if the array/iterable is not part of the top level, since the only top-level BSON type is a document (unlike JSON, which allows arrays, integers, strings, etc at the top level).#[derive(Debug, Serialize)] struct MyData { strings: Vec<String>, } let md = MyData { strings: vec!["a".to_string()] }; let raw_bson = bson::to_vec(&md)?; // vec of BSON bytes whose "strings" field is an array
Once the raw BSON work is done, you'll be able to serialize to a
RawBson::Array(RawArrayBuf)
value even at the top level, but again this value is only useful as a field in something else if its meant to eventually be inserted in the database.If you're talking about directly serializing iterables of structs for the purposes of inserting them, you can also do that today via
Collection::insert_many
:let collection = db.collection::<MyType>("my_coll"); collection.insert_many(vec![ MyType::new(), MyType::new(), ... ], None).await?;
This is a lot faster than calling
Collection::insert_one
in a loop as it uses a lot fewer round trips to the database (usually just a single one).
Perfect, thanks very much again @patrickfreed ! Should we leave this issue open for future reference until the raw BSON work is released?
No problem, happy to help!
Leaving this open sounds fine to me. Once the raw BSON stuff is merged, I'll circle back with some updated examples (the API isn't completely set in stone just yet).
Nice work! When will release new version?
We've released betas of both the driver and the BSON library which contain support for the raw BSON features I mentioned above. To start using them, update your mongodb
dependency in Cargo.toml
to 2.2.0-beta
and your bson
one (if it's there) to 2.2.0-beta.1
. If you do try it out, please let us know if you run into any issues!
Note that network latency and DB processing constitute a large amount of the time spent waiting on a query, so you may not see huge performance improvements by using borrowed deserialization instead of regular owned deserialization to a T
. If latency is really low and the documents being deserialized are really big, there can be significant improvements, however.
Here's an example program that demonstrates how to use raw BSON with the driver:
use mongodb::{
bson::{
rawdoc, spec::BinarySubtype, Binary, RawArray, RawBsonRef, RawDocument, RawDocumentBuf,
},
Client,
};
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct MyBorrowedData<'a> {
#[serde(borrow)]
string: &'a str,
#[serde(borrow)]
bin: &'a [u8],
#[serde(borrow)]
doc: &'a RawDocument,
#[serde(borrow)]
array: &'a RawArray,
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let client = Client::with_uri_str("mongodb://localhost:27017").await?;
let coll = client.database("foo").collection::<MyBorrowedData>("bar");
coll.clone_with_type::<RawDocumentBuf>()
.insert_one(
rawdoc! {
"string": "hello world",
"bin": Binary {
bytes: vec![1, 2, 3, 4],
subtype: BinarySubtype::Generic
},
"doc": {
"a": "subdoc",
"b": true
},
"array": [
12,
12.5,
false
]
},
None,
)
.await?;
let mut cursor = coll.find(None, None).await?;
while cursor.advance().await? {
let data = cursor.deserialize_current()?;
println!("{:#?}", data);
println!("doc.a => {}", data.doc.get_str("a")?);
println!(
"doc.array => {:#?}",
data.array
.into_iter()
.collect::<mongodb::bson::raw::Result<Vec<RawBsonRef>>>()?
);
}
Ok(())
}
And this prints the following:
MyBorrowedData {
string: "hello world",
bin: [
1,
2,
3,
4,
],
doc: RawDocument {
data: "1700000002610007000000737562646f63000862000100",
},
array: RawArray {
data: "1b0000001030000c00000001310000000000000029400832000000",
},
}
doc.a => subdoc
doc.array => [
Int32(
12,
),
Double(
12.5,
),
Boolean(
false,
),
]
We've released betas of both the driver and the BSON library which contain support for the raw BSON features I mentioned above. To start using them, update your
mongodb
dependency inCargo.toml
to2.2.0-beta
and yourbson
one (if it's there) to2.2.0-beta.1
. If you do try it out, please let us know if you run into any issues!Note that network latency and DB processing constitute a large amount of the time spent waiting on a query, so you may not see huge performance improvements by using borrowed deserialization instead of regular owned deserialization to a
T
. If latency is really low and the documents being deserialized are really big, there can be significant improvements, however.Here's an example program that demonstrates how to use raw BSON with the driver:
use mongodb::{ bson::{ rawdoc, spec::BinarySubtype, Binary, RawArray, RawBsonRef, RawDocument, RawDocumentBuf, }, Client, }; use serde::Deserialize; #[derive(Debug, Deserialize)] struct MyBorrowedData<'a> { #[serde(borrow)] string: &'a str, #[serde(borrow)] bin: &'a [u8], #[serde(borrow)] doc: &'a RawDocument, #[serde(borrow)] array: &'a RawArray, } #[tokio::main] async fn main() -> anyhow::Result<()> { let client = Client::with_uri_str("mongodb://localhost:27017").await?; let coll = client.database("foo").collection::<MyBorrowedData>("bar"); coll.clone_with_type::<RawDocumentBuf>() .insert_one( rawdoc! { "string": "hello world", "bin": Binary { bytes: vec![1, 2, 3, 4], subtype: BinarySubtype::Generic }, "doc": { "a": "subdoc", "b": true }, "array": [ 12, 12.5, false ] }, None, ) .await?; let mut cursor = coll.find(None, None).await?; while cursor.advance().await? { let data = cursor.deserialize_current()?; println!("{:#?}", data); println!("doc.a => {}", data.doc.get_str("a")?); println!( "doc.array => {:#?}", data.array .into_iter() .collect::<mongodb::bson::raw::Result<Vec<RawBsonRef>>>()? ); } Ok(()) }
And this prints the following:
MyBorrowedData { string: "hello world", bin: [ 1, 2, 3, 4, ], doc: RawDocument { data: "1700000002610007000000737562646f63000862000100", }, array: RawArray { data: "1b0000001030000c00000001310000000000000029400832000000", }, } doc.a => subdoc doc.array => [ Int32( 12, ), Double( 12.5, ), Boolean( false, ), ]
The performance is the main key to my current circumstances, I will update all related libs and let you know if facing any issues! Thanks so much for your hard work! Really appreciate that!
Hello Everyone! My team is currently writing a very traffic-heavy server, so our main goals are performance and security (which are Rust's lead perks). I was extremely happy with Rust's actix-web framework performance, before introducing Bson objects. I've started reading about this issue and found those benchmarks, and also an alternative for document operations. https://github.com/only-cliches/NoProto I'm wondering if it's possible to replace BSON with NoProto Documents? They seem to have the same functionality, but noProto is around 160x faster for decodes, and 85x faster for updates of a single document.
I understand that Document functionality is one of the core MongoDB features, but using BSON for it is a major performance hit for the Rust driver. Changing it might raise the performance several times!
Thanks for your time and attention!
My bench results: