mongodb / mongo-rust-driver

The official MongoDB Rust Driver
https://mongodb.github.io/mongo-rust-driver/manual
Apache License 2.0
1.41k stars 156 forks source link

BSON performance #528

Closed xxated closed 2 years ago

xxated commented 2 years ago

Hello Everyone! My team is currently writing a very traffic-heavy server, so our main goals are performance and security (which are Rust's lead perks). I was extremely happy with Rust's actix-web framework performance, before introducing Bson objects. I've started reading about this issue and found those benchmarks, and also an alternative for document operations. https://github.com/only-cliches/NoProto I'm wondering if it's possible to replace BSON with NoProto Documents? They seem to have the same functionality, but noProto is around 160x faster for decodes, and 85x faster for updates of a single document.

I understand that Document functionality is one of the core MongoDB features, but using BSON for it is a major performance hit for the Rust driver. Changing it might raise the performance several times!

Thanks for your time and attention!

My bench results:

========= SIZE BENCHMARK =========
NoProto:     size: 308b, zlib: 198b
Flatbuffers: size: 264b, zlib: 181b
Bincode:     size: 163b, zlib: 129b
Postcard:    size: 128b, zlib: 119b
Protobuf:    size: 154b, zlib: 141b
MessagePack: size: 311b, zlib: 193b
JSON:        size: 439b, zlib: 184b
BSON:        size: 414b, zlib: 216b
Prost:       size: 154b, zlib: 142b
Avro:        size: 702b, zlib: 333b
Flexbuffers: size: 490b, zlib: 309b
Abomonation: size: 261b, zlib: 165b
Rkyv:        size: 180b, zlib: 152b
Raw BSON:    size: 414b, zlib: 216b
MessagePack: size: 296b, zlib: 187b
Serde JSON:  size: 446b, zlib: 198b

======== ENCODE BENCHMARK ========
NoProto:           739 ops/ms 1.00
Flatbuffers:      2710 ops/ms 3.66
Bincode:          9615 ops/ms 13.03
Postcard:         4505 ops/ms 6.10
Protobuf:         1484 ops/ms 2.01
MessagePack:       760 ops/ms 1.03
JSON:              700 ops/ms 0.95
BSON:              196 ops/ms 0.27
Prost:            1773 ops/ms 2.40
Avro:              235 ops/ms 0.32
Flexbuffers:       483 ops/ms 0.65
Abomonation:      5405 ops/ms 7.30
Rkyv:             3690 ops/ms 4.99
Raw BSON:          203 ops/ms 0.28
MessagePack:       284 ops/ms 0.39
Serde JSON:       1167 ops/ms 1.58

======== DECODE BENCHMARK ========
NoProto:          1085 ops/ms 1.00
Flatbuffers:     12821 ops/ms 11.81
Bincode:          6944 ops/ms 6.40
Postcard:         5682 ops/ms 5.22
Protobuf:         1727 ops/ms 1.59
MessagePack:       561 ops/ms 0.52
JSON:              564 ops/ms 0.52
BSON:              164 ops/ms 0.15
Prost:            2625 ops/ms 2.41
Avro:               72 ops/ms 0.07
Flexbuffers:       562 ops/ms 0.52
Abomonation:     83333 ops/ms 73.77
Rkyv:            58824 ops/ms 52.62
Raw BSON:          925 ops/ms 0.85
MessagePack:       376 ops/ms 0.35
Serde JSON:        377 ops/ms 0.35

====== DECODE ONE BENCHMARK ======
NoProto:         30303 ops/ms 1.00
Flatbuffers:    142857 ops/ms 4.24
Bincode:          7407 ops/ms 0.24
Postcard:         6289 ops/ms 0.21
Protobuf:         1751 ops/ms 0.06
MessagePack:       721 ops/ms 0.02
JSON:              714 ops/ms 0.02
BSON:              186 ops/ms 0.01
Prost:            2710 ops/ms 0.09
Avro:               83 ops/ms 0.00
Flexbuffers:     15385 ops/ms 0.50
Abomonation:    333333 ops/ms 10.65
Rkyv:           250000 ops/ms 7.14
Raw BSON:        15625 ops/ms 0.51
MessagePack:       404 ops/ms 0.01
Serde JSON:        375 ops/ms 0.01

====== UPDATE ONE BENCHMARK ======
NoProto:         11494 ops/ms 1.00
Flatbuffers:      2336 ops/ms 0.20
Bincode:          4367 ops/ms 0.38
Postcard:         2674 ops/ms 0.23
Protobuf:          706 ops/ms 0.06
MessagePack:       312 ops/ms 0.03
JSON:              525 ops/ms 0.05
BSON:              136 ops/ms 0.01
Prost:            1121 ops/ms 0.10
Avro:               54 ops/ms 0.00
Flexbuffers:       251 ops/ms 0.02
Abomonation:      5495 ops/ms 0.48
Rkyv:             3247 ops/ms 0.28
Raw BSON:          140 ops/ms 0.01
MessagePack:       215 ops/ms 0.02
Serde JSON:        289 ops/ms 0.03

//! | Format / Lib                                               | Encode  | Decode All | Decode 1 | Update 1 | Size (bytes) | Size (Zlib) |
//! |------------------------------------------------------------|---------|------------|----------|----------|--------------|-------------|
//! | **Runtime Libs**                                           |         |            |          |          |              |             |
//! | *NoProto*                                                  |         |            |          |          |              |             |
//! |        [no_proto](https://crates.io/crates/no_proto)       |     739 |       1085 |    30303 |    11494 |          308 |         198 |
//! | Apache Avro                                                |         |            |          |          |              |             |
//! |         [avro-rs](https://crates.io/crates/avro-rs)        |     235 |         72 |       83 |       54 |          702 |         333 |
//! | FlexBuffers                                                |         |            |          |          |              |             |
//! |     [flexbuffers](https://crates.io/crates/flexbuffers)    |     483 |        562 |    15385 |      251 |          490 |         309 |
//! | JSON                                                       |         |            |          |          |              |             |
//! |            [json](https://crates.io/crates/json)           |     700 |        564 |      714 |      525 |          439 |         184 |
//! |      [serde_json](https://crates.io/crates/serde_json)     |    1167 |        377 |      375 |      289 |          446 |         198 |
//! | BSON                                                       |         |            |          |          |              |             |
//! |            [bson](https://crates.io/crates/bson)           |     196 |        164 |      186 |      136 |          414 |         216 |
//! |         [rawbson](https://crates.io/crates/rawbson)        |     203 |        925 |    15625 |      140 |          414 |         216 |
//! | MessagePack                                                |         |            |          |          |              |             |
//! |             [rmp](https://crates.io/crates/rmp)            |     760 |        561 |      721 |      312 |          311 |         193 |
//! |  [messagepack-rs](https://crates.io/crates/messagepack-rs) |     284 |        376 |      404 |      215 |          296 |         187 |
//! | **Compiled Libs**                                          |         |            |          |          |              |             |
//! | Flatbuffers                                                |         |            |          |          |              |             |
//! |     [flatbuffers](https://crates.io/crates/flatbuffers)    |    2710 |      12821 |   142857 |     2336 |          264 |         181 |
//! | Bincode                                                    |         |            |          |          |              |             |
//! |         [bincode](https://crates.io/crates/bincode)        |    9615 |       6944 |     7407 |     4367 |          163 |         129 |
//! | Postcard                                                   |         |            |          |          |              |             |
//! |        [postcard](https://crates.io/crates/postcard)       |    4505 |       5682 |     6289 |     2674 |          128 |         119 |
//! | Protocol Buffers                                           |         |            |          |          |              |             |
//! |        [protobuf](https://crates.io/crates/protobuf)       |    1484 |       1727 |     1751 |      706 |          154 |         141 |
//! |           [prost](https://crates.io/crates/prost)          |    1773 |       2625 |     2710 |     1121 |          154 |         142 |
//! | Abomonation                                                |         |            |          |          |              |             |
//! |     [abomonation](https://crates.io/crates/abomonation)    |    5405 |      83333 |   333333 |     5495 |          261 |         165 |
//! | Rkyv                                                       |         |            |          |          |              |             |
//! |            [rkyv](https://crates.io/crates/rkyv)           |    3690 |      58824 |   250000 |     3247 |          180 |         152 |
djkoloski commented 2 years ago

Here's another set of benchmarks that might be helpful.

xxated commented 2 years ago

Here's another set of benchmarks that might be helpful.

Nice! But I don't see BSON tests here, am I blind?😂

djkoloski commented 2 years ago

I can add them for comparison, the benchmark suggestion was mostly to evaluate other formats.

xxated commented 2 years ago

I can add them for comparison, the benchmark suggestion was mostly to evaluate other formats.

If you can that would be extremely helpful!

djkoloski commented 2 years ago

The benchmarks have been updated with numbers for bson. It's using a modified version of to_vec that avoids reallocations (PR) for a fair comparison.

patrickfreed commented 2 years ago

BSON is the format that MongoDB uses both for data storage and to communicate with drivers, so it won't be possible to change the driver to use another format. You can greatly speed up driver performance by utilizing a T that isn't Document in your collections though (this isn't reflected in the NoProto benchmarks).

e.g.

#[derive(Deserialize, Serialize, Debug)]
struct MyType { /* fields here */ }

let coll = db.collection::<MyType>("my_coll");
coll.insert_one(MyType::new(...), None).await?;
let mt: MyType = coll.find_one(doc! {}, None).await?.unwrap();

Also, we're currently working on introducing a number of raw-BSON wrapper types, borrowing a lot of code from the rawbson crate. Once that's done, you'll be able to perform borrowed deserialization, which will be even faster:

#[derive(Debug, Deserialize, Serialize)]
struct MyTypeRef<'a> {
    some_borrowed_field: &'a str,
}

let coll = db.collection::<RawDocumentBuf>("my_coll");
coll.insert_one(bson::to_raw_document_buf(MyType::new(...))?, None).await?;
let rawdoc: RawDocumentBuf = coll.find_one(doc! {}, None).await?.unwrap();
let mt: MyTypeRef = bson::from_slice(rawdoc.as_bytes())?;

BSON won't ever reach the speeds of NoProto or some of these other high performance serialization formats due to its dynamic / self-describing nature, but for most driver use-cases this won't really matter though, since the majority of the driver's execution time will be spent on network I/O between the driver and the server, with (de)serialization being negligible in comparison. That being said, we're always striving to improve the performance of bson, so if you have any specific workloads that seem slower than they ought to be, we'd love to hear about them!

ConstBur commented 2 years ago

Thank you very much Patrick! Network I/O will not be a bottleneck since we're hosting the server and MongoDB with a cloud Kubernetes provider in the same datacenter, so for us the bottleneck is the driver.

I've got a few further questions: 1) Just to confirm, the performance boost in the first example comes from serializing the struct directly into BSON instead of constructing a Document first? 2) Would using a struct for find_one filter boost performance as well? 3) Are there plans to use bson::to_raw_document_buf implicitly everytime a struct is received (mostly to avoid repetitive code), or is there a breaking change in raw BSON that would prevent that? 4) Will it be possible to convert arrays/Iterables of structs into raw BSON directly?

patrickfreed commented 2 years ago

Just to confirm, the performance boost in the first example comes from serializing the struct directly into BSON instead of constructing a Document first?

Yep, and also it deserializes it directly from BSON without having to go through Document in read operations. For queries that return a lot of results, this can make a big difference.

Would using a struct for find_one filter boost performance as well?

Our API currently doesn't support doing this, but I wouldn't think so unless the filter was really huge.

Are there plans to use bson::to_raw_document_buf implicitly everytime a struct is received (mostly to avoid repetitive code), or is there a breaking change in raw BSON that would prevent that?

I invoked that explicitly in that example so that I could use a single Collection for both writes and reads, since Collection is generic over a single type that's used for both. You can have the driver automatically invoke to_raw_document_buf for you though (and it's potentially faster to do this) by using a Collection<MyType> for inserts and a Collection<RawDocumentBuf> for borrowed deserialization in reads.

Will it be possible to convert arrays/Iterables of structs into raw BSON directly?

Yep, and you can actually do this today via bson::to_vec if the array/iterable is not part of the top level, since the only top-level BSON type is a document (unlike JSON, which allows arrays, integers, strings, etc at the top level).

#[derive(Debug, Serialize)]
struct MyData {
    strings: Vec<String>,
}

let md = MyData { strings: vec!["a".to_string()] };
let raw_bson = bson::to_vec(&md)?; // vec of BSON bytes whose "strings" field is an array

Once the raw BSON work is done, you'll be able to serialize to a RawBson::Array(RawArrayBuf) value even at the top level, but again this value is only useful as a field in something else if its meant to eventually be inserted in the database.

If you're talking about directly serializing iterables of structs for the purposes of inserting them, you can also do that today via Collection::insert_many:

let collection = db.collection::<MyType>("my_coll");
collection.insert_many(vec![
    MyType::new(),
    MyType::new(),
    ...
], None).await?;

This is a lot faster than calling Collection::insert_one in a loop as it uses a lot fewer round trips to the database (usually just a single one).

xxated commented 2 years ago

Just to confirm, the performance boost in the first example comes from serializing the struct directly into BSON instead of constructing a Document first?

Yep, and also it deserializes it directly from BSON without having to go through Document in read operations. For queries that return a lot of results, this can make a big difference.

Would using a struct for find_one filter boost performance as well?

Our API currently doesn't support doing this, but I wouldn't think so unless the filter was really huge.

Are there plans to use bson::to_raw_document_buf implicitly everytime a struct is received (mostly to avoid repetitive code), or is there a breaking change in raw BSON that would prevent that?

I invoked that explicitly in that example was so that I could use a single Collection for both writes and reads, since Collection is generic over a single type that's used for both. You can have the driver automatically invoke to_raw_document_buf for you though (and it's potentially faster to do this) by using a Collection<MyType> for inserts and a Collection<RawDocumentBuf> for borrowed deserialization in reads.

Will it be possible to convert arrays/Iterables of structs into raw BSON directly?

Yep, and you can actually do this today via bson::to_vec if the array/iterable is not part of the top level, since the only top-level BSON type is a document (unlike JSON, which allows arrays, integers, strings, etc at the top level).

#[derive(Debug, Serialize)]
struct MyData {
    strings: Vec<String>,
}

let md = MyData { strings: vec!["a".to_string()] };
let raw_bson = bson::to_vec(&md)?; // vec of BSON bytes whose "strings" field is an array

Once the raw BSON work is done, you'll be able to serialize to a RawBson::Array(RawArrayBuf) value even at the top level, but again this value is only useful as a field in something else if its meant to eventually be inserted in the database.

If you're talking about directly serializing iterables of structs for the purposes of inserting them, you can also do that today via Collection::insert_many:

let collection = db.collection::<MyType>("my_coll");
collection.insert_many(vec![
    MyType::new(),
    MyType::new(),
    ...
], None).await?;

This is a lot faster than calling Collection::insert_one in a loop as it uses a lot fewer round trips to the database (usually just a single one).

Perfect, thanks very much again @patrickfreed ! Should we leave this issue open for future reference until the raw BSON work is released?

patrickfreed commented 2 years ago

No problem, happy to help!

Leaving this open sounds fine to me. Once the raw BSON stuff is merged, I'll circle back with some updated examples (the API isn't completely set in stone just yet).

attila-lin commented 2 years ago

Nice work! When will release new version?

patrickfreed commented 2 years ago

We've released betas of both the driver and the BSON library which contain support for the raw BSON features I mentioned above. To start using them, update your mongodb dependency in Cargo.toml to 2.2.0-beta and your bson one (if it's there) to 2.2.0-beta.1. If you do try it out, please let us know if you run into any issues!

Note that network latency and DB processing constitute a large amount of the time spent waiting on a query, so you may not see huge performance improvements by using borrowed deserialization instead of regular owned deserialization to a T. If latency is really low and the documents being deserialized are really big, there can be significant improvements, however.

Here's an example program that demonstrates how to use raw BSON with the driver:

use mongodb::{
    bson::{
        rawdoc, spec::BinarySubtype, Binary, RawArray, RawBsonRef, RawDocument, RawDocumentBuf,
    },
    Client,
};
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct MyBorrowedData<'a> {
    #[serde(borrow)]
    string: &'a str,
    #[serde(borrow)]
    bin: &'a [u8],
    #[serde(borrow)]
    doc: &'a RawDocument,
    #[serde(borrow)]
    array: &'a RawArray,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let client = Client::with_uri_str("mongodb://localhost:27017").await?;
    let coll = client.database("foo").collection::<MyBorrowedData>("bar");

    coll.clone_with_type::<RawDocumentBuf>()
        .insert_one(
            rawdoc! {
                "string": "hello world",
                "bin": Binary {
                    bytes: vec![1, 2, 3, 4],
                    subtype: BinarySubtype::Generic
                },
                "doc": {
                    "a": "subdoc",
                    "b": true
                },
                "array": [
                    12,
                    12.5,
                    false
                ]
            },
            None,
        )
        .await?;

    let mut cursor = coll.find(None, None).await?;
    while cursor.advance().await? {
        let data = cursor.deserialize_current()?;
        println!("{:#?}", data);
        println!("doc.a => {}", data.doc.get_str("a")?);
        println!(
            "doc.array => {:#?}",
            data.array
                .into_iter()
                .collect::<mongodb::bson::raw::Result<Vec<RawBsonRef>>>()?
        );
    }

    Ok(())
}

And this prints the following:

MyBorrowedData {
    string: "hello world",
    bin: [
        1,
        2,
        3,
        4,
    ],
    doc: RawDocument {
        data: "1700000002610007000000737562646f63000862000100",
    },
    array: RawArray {
        data: "1b0000001030000c00000001310000000000000029400832000000",
    },
}
doc.a => subdoc
doc.array => [
    Int32(
        12,
    ),
    Double(
        12.5,
    ),
    Boolean(
        false,
    ),
]
xxated commented 2 years ago

We've released betas of both the driver and the BSON library which contain support for the raw BSON features I mentioned above. To start using them, update your mongodb dependency in Cargo.toml to 2.2.0-beta and your bson one (if it's there) to 2.2.0-beta.1. If you do try it out, please let us know if you run into any issues!

Note that network latency and DB processing constitute a large amount of the time spent waiting on a query, so you may not see huge performance improvements by using borrowed deserialization instead of regular owned deserialization to a T. If latency is really low and the documents being deserialized are really big, there can be significant improvements, however.

Here's an example program that demonstrates how to use raw BSON with the driver:

use mongodb::{
    bson::{
        rawdoc, spec::BinarySubtype, Binary, RawArray, RawBsonRef, RawDocument, RawDocumentBuf,
    },
    Client,
};
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct MyBorrowedData<'a> {
    #[serde(borrow)]
    string: &'a str,
    #[serde(borrow)]
    bin: &'a [u8],
    #[serde(borrow)]
    doc: &'a RawDocument,
    #[serde(borrow)]
    array: &'a RawArray,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let client = Client::with_uri_str("mongodb://localhost:27017").await?;
    let coll = client.database("foo").collection::<MyBorrowedData>("bar");

    coll.clone_with_type::<RawDocumentBuf>()
        .insert_one(
            rawdoc! {
                "string": "hello world",
                "bin": Binary {
                    bytes: vec![1, 2, 3, 4],
                    subtype: BinarySubtype::Generic
                },
                "doc": {
                    "a": "subdoc",
                    "b": true
                },
                "array": [
                    12,
                    12.5,
                    false
                ]
            },
            None,
        )
        .await?;

    let mut cursor = coll.find(None, None).await?;
    while cursor.advance().await? {
        let data = cursor.deserialize_current()?;
        println!("{:#?}", data);
        println!("doc.a => {}", data.doc.get_str("a")?);
        println!(
            "doc.array => {:#?}",
            data.array
                .into_iter()
                .collect::<mongodb::bson::raw::Result<Vec<RawBsonRef>>>()?
        );
    }

    Ok(())
}

And this prints the following:

MyBorrowedData {
    string: "hello world",
    bin: [
        1,
        2,
        3,
        4,
    ],
    doc: RawDocument {
        data: "1700000002610007000000737562646f63000862000100",
    },
    array: RawArray {
        data: "1b0000001030000c00000001310000000000000029400832000000",
    },
}
doc.a => subdoc
doc.array => [
    Int32(
        12,
    ),
    Double(
        12.5,
    ),
    Boolean(
        false,
    ),
]

The performance is the main key to my current circumstances, I will update all related libs and let you know if facing any issues! Thanks so much for your hard work! Really appreciate that!