Query Performance - Githubissues

willbush commented 1 year ago

I've been testing the query performance tiberius and odbc-api vs C#.

These are the numbers I'm getting more or less:

number of rows: 3,838,164 C# dotnet = ~11 sec Rust odbc-api = ~14 sec Rust tiberius = ~20 sec

I'm seeing nearly double the speed in C# compared to tiberius. I apologize this is not very rigorous testing or reproducible. I'm working with proprietary data / sprocs. I did three runs per library on a computer running windows with an on-prem production database.

I suppose, to make this reproducible, a test could be written to fill a local DB with a bunch of data and test how long it takes to query all the rows. I'm willing to help with that.

Anyway, I was wondering if I'm doing anything obviously wrong:

tiberius

use anyhow::Result;
use async_std::net::TcpStream;
use once_cell::sync::Lazy;
use std::{env, str::FromStr};
use std::time::Instant;
use tiberius::{Client, Config, Query, SqlBrowser, Uuid};

static CONN_STR: Lazy<String> = Lazy::new(|| {
    env::var("TIBERIUS_TEST_CONNECTION_STRING").unwrap_or_else(|_| {
        "Data Source=Prod.xyz.local\\Prod;Database=Prod;Integrated Security=true;Connection Timeout=200;Max Pool Size=250;Encrypt=False;".to_owned()
    })
});

#[async_std::main]
async fn main() -> Result<()> {
    let config = Config::from_ado_string(&CONN_STR)?;
    let tcp = TcpStream::connect_named(&config).await?;

    tcp.set_nodelay(true)?;

    let mut client = Client::connect(config, tcp).await?;

    let select = Query::new("[dbo].[usp_GetLotsOfThings]");

    let start_time = Instant::now();

    let rows = select.query(&mut client).await?.into_first_result().await?;

    let elapsed_time = start_time.elapsed();

    println!("Elapsed time (secs): {:?}", elapsed_time.as_secs());

    println!("number of rows: {}", rows.len());
}

odbc-api

use std::time::Instant;

use anyhow::Error;
use odbc_api::{buffers::TextRowSet, Cursor, Environment};

/// Maximum number of rows fetched with one row set. Fetching batches of rows is usually much
/// faster than fetching individual rows.
const BATCH_SIZE: usize = 5000;

fn main() -> Result<(), Error> {
    // Production
    let connection_string = "
        Driver={ODBC Driver 17 for SQL Server};\
        Server=Prod.xyz.local\\Prod;\
        Database=Prod;\
        Trusted_Connection=Yes;\
        Connection Timeout=200;\
        Max Pool Size=250;\
        Encrypt=No;\
    ";

    let env = Environment::new()?;
    let conn = env.connect_with_connection_string(&connection_string)?;

    let mut prepared = conn.prepare("[dbo].[usp_GetLotsOfThings]")?;
    let mut count = 0;

    let start_time = Instant::now();

    match prepared.execute(()) {
        Err(e) => println!("{}", e),
        Ok(None) => println!("No result set generated."),
        Ok(Some(mut cursor)) => {
            // Use schema in cursor to initialize a text buffer large enough to hold the largest
            // possible strings for each column up to an upper limit of 4KiB.
            let mut buffers = TextRowSet::for_cursor(BATCH_SIZE, &mut cursor, Some(4096))?;
            // Bind the buffer to the cursor. It is now being filled with every call to fetch.
            let mut row_set_cursor = cursor.bind_buffer(&mut buffers)?;

            // Iterate over batches
            while let Some(batch) = row_set_cursor.fetch()? {
                // Within a batch, iterate over every row
                for row_index in 0..batch.num_rows() {
                    // Within a row iterate over every column
                    let _record = (0..batch.num_cols())
                        .map(|col_index| batch.at(col_index, row_index).unwrap_or(&[]));

                    count += 1;
                }
            }
        }
    }
    println!("Count: {}", count);

    let elapsed_time = start_time.elapsed();

    println!("Elapsed time (secs): {:?}", elapsed_time.as_secs());

    Ok(())
}

ping @pacman82

pacman82 commented 1 year ago

Hi, thanks for investigating performance. I do not see anything obviously wrong, but just want to state a few things which may, (or may not) be obvious to you.

TextRowSet is intended to be used if the entire table content is supposed to be printed as text. In many scenarios you may want to fetch an integer column as an actual integer. In these scenarions TextRowSet may add additional overhead.
For an application trying for speed (and not processing everything as text) should likely use ColumnarAnyBuffer for fetching rows. This is more code since you have to setup buffer descriptions. For a quick evaluation of wether this makes a difference or not, you could employ arrow-odbc. This artefacts fetches everything into fitting arrow buffers. However, contrary to your example this would do additional work of actually doing something with the results and copying it into arrow buffers. Might still be faster overall though (depending on your table).
Depending on their abstraction level these libraries might do different amounts of work after fetching the data. I would define a common target for each benchmark (like parsing the data into a common datastructure, or calculating the sum of a column full of integers). This IMHO will make for a fairer comparision. It would also enable me to give better answers on how to write faster code. E.g. your current on example does nothing with the data, so the fastest way to do that, would not to query at all. Of course you know that, the reason I am a bit nitpicky about the application doing something, is that odbc-api is low level, and unbiased about the use case. It is hard to tell if it is used correctly without the use case in mind.

Cheers, Markus

willbush commented 1 year ago

@pacman82 Thanks for the detailed response. I'll look into this more probably this weekend. arrow-odbc sounds interesting I'll check that out too.

willbush commented 1 year ago

Sorry, I never got around to testing this further. Ended up using tiberius for a slack bot because I realized I need async support more than performance atm. I was worried about performance because it in systems we operate where it matters, Rust as an alternative to C# becomes a hard sell. I may look into this more later, especially if there's interest.

pacman82 commented 1 year ago

Hello @willbush ,

no problem. At one of my previous employees I have also been advocating for using Rust more. Apart from one exceptions my motivations for introducing it were not around performance, but around maintainability and ease of deployment. I know it seems a tougher sell because performance is (in this context) easy to measure.

My point is this, understand your true motivation of why you want to use Rust, and sell it based on that.

Just my two cents of experience.

Best, Markus

bangbaew commented 1 year ago

My table has only one row with 85MB xml string stored in a column, .NET Framework 4.8 query took 800ms, while Tiberius stream.into_results().await.unwrap(); took 6s, how can i improve Tiberius to match .NET one?

I tried ODBC API, I have to set the max_str_limit, in my schema, the column is nvarchar(max), so i set max_str_limit to usize::MAX, it panicked

thread 'main' panicked at 'range end index 18446744073709551615 out of range for slice of length 0', C:\Users\Me\.cargo\registry\src\index.crates.io-6f17d22bba15001f\odbc-api-0.57.0\src\buffers\text_column.rs:103:14

How can I set ODBC API to have flexible string size?

bangbaew commented 1 year ago

Hi, thanks for investigating performance. I do not see anything obviously wrong, but just want to state a few things which may, (or may not) be obvious to you.

TextRowSet is intended to be used if the entire table content is supposed to be printed as text. In many scenarios you may want to fetch an integer column as an actual integer. In these scenarions TextRowSet may add additional overhead.

For an application trying for speed (and not processing everything as text) should likely use ColumnarAnyBuffer for fetching rows. This is more code since you have to setup buffer descriptions. For a quick evaluation of wether this makes a difference or not, you could employ arrow-odbc. This artefacts fetches everything into fitting arrow buffers. However, contrary to your example this would do additional work of actually doing something with the results and copying it into arrow buffers. Might still be faster overall though (depending on your table).

Depending on their abstraction level these libraries might do different amounts of work after fetching the data. I would define a common target for each benchmark (like parsing the data into a common datastructure, or calculating the sum of a column full of integers). This IMHO will make for a fairer comparision. It would also enable me to give better answers on how to write faster code. E.g. your current on example does nothing with the data, so the fastest way to do that, would not to query at all. Of course you know that, the reason I am a bit nitpicky about the application doing something, is that odbc-api is low level, and unbiased about the use case. It is hard to tell if it is used correctly without the use case in mind.

Cheers, Markus

I tried this method to get large payloads from the table.

let cursor = conn.execute(&query, ()).unwrap().unwrap();
let payloads = process_rows(cursor);

fn process_rows(mut cursor: CursorImpl<StatementImpl<'_>>) -> Vec<String> {
    let mut payloads = Vec::new();

    if let Some(mut row) = cursor.next_row().unwrap() {
        let mut buf = Vec::new();
        row.get_text(1, &mut buf).unwrap();
        let ret = String::from_utf8(buf).unwrap();
        payloads.push(ret);

        // Recursive call to process the next row
        payloads.extend(process_rows(cursor));
    }

    payloads
}

The performance really satisfies me, it took only 1.4s for an 85MB query, which is on par with .NET version, while Tiberius took 6s, which is too slow, but I'm new to Rust and I'm not sure if this is the safest/most efficient way to implement this kind of job, I would greatly appreciate your suggestions for any potential improvements. Thank you.

Edit: I added 85MB more data to the table, now Rust ODBC API is faster than .NET version (Rust at 2.7540951s, .NET at 3.3257967s)

pacman82 commented 1 year ago

Hello @bangbaew ,

thanks for the insightful numbers. I am the author of odbc-api. Looking at your code there is something you might want to try. Your current algorithm fetches each new row with an empty buffer. This causes an additional roundtrip to the database to figure out the size of the string. To the price of an extra copy of the payload you could optimize two things:

Reuse the same buffer for fetching data to minimize roundtrips to the database. This also causes the extra copy, because instead of moving it into the resulting string, you know have to copy it.
Once you reuse always the same buffer, you can minimize functional calls into the driver manager by keeping it bound to the cursor instead of binding it anew for every row. This is however only a safe in the order of milliseconds and usually more worth it if you have many small rows instead of a few big ones.

Whether or not these two optimizations would offset the copy depends on the usecase, I cannot tell. Depending on how much you care it is worth trying though. Usually minimizing roundtrips is king though.

Best, Markus

bangbaew commented 1 year ago

Hello @bangbaew ,

thanks for the insightful numbers. I am the author of odbc-api. Looking at your code there is something you might want to try. Your current algorithm fetches each new row with an empty buffer. This causes an additional roundtrip to the database to figure out the size of the string. To the price of an extra copy of the payload you could optimize two things:

Reuse the same buffer for fetching data to minimize roundtrips to the database. This also causes the extra copy, because instead of moving it into the resulting string, you know have to copy it.

Once you reuse always the same buffer, you can minimize functional calls into the driver manager by keeping it bound to the cursor instead of binding it anew for every row. This is however only a safe in the order of milliseconds and usually more worth it if you have many small rows instead of a few big ones.

Whether or not these two optimizations would offset the copy depends on the usecase, I cannot tell. Depending on how much you care it is worth trying though. Usually minimizing roundtrips is king though.

Best, Markus

Great, I tried reusing the buffer like this, and querying from the table with 10 rows of 85mb string data.

let cursor = conn.execute(&query, ()).unwrap().unwrap();

let mut buf = Vec::new();
let payloads = process_rows(cursor, &mut buf);

fn process_rows(mut cursor: CursorImpl<StatementImpl<'_>>, buf: &mut Vec<u8>) -> Vec<String> {
    let mut payloads = Vec::new();

    if let Some(mut row) = cursor.next_row().unwrap() {
        row.get_text(1, buf).unwrap();
        let ret = String::from_utf8(buf.to_vec()).unwrap();
        payloads.push(ret);

        // Recursive call to process the next row
        payloads.extend(process_rows(cursor, buf));
    }

    payloads
}

The query time reduces from 1.8s to 1.4s, which I consider a lot of milliseconds! Hope the next releases of Tiberius are as fast as this.

bangbaew commented 1 year ago

Hello @bangbaew ,

thanks for the insightful numbers. I am the author of odbc-api. Looking at your code there is something you might want to try. Your current algorithm fetches each new row with an empty buffer. This causes an additional roundtrip to the database to figure out the size of the string. To the price of an extra copy of the payload you could optimize two things:

Reuse the same buffer for fetching data to minimize roundtrips to the database. This also causes the extra copy, because instead of moving it into the resulting string, you know have to copy it.

Once you reuse always the same buffer, you can minimize functional calls into the driver manager by keeping it bound to the cursor instead of binding it anew for every row. This is however only a safe in the order of milliseconds and usually more worth it if you have many small rows instead of a few big ones.

Whether or not these two optimizations would offset the copy depends on the usecase, I cannot tell. Depending on how much you care it is worth trying though. Usually minimizing roundtrips is king though.

Best, Markus

Currently, it takes ~1.5s to for the row.get_text() to complete, so if I have many rows, it will take a very long time, I tried printing the time fetching each row and this is the result:

Is it possible to spawn a thread for each row and process the data like this?, so I can get all the rows at the same time.

loop {
        if let Some(mut row) = cursor.next_row().unwrap() {
            tokio::spawn(async move {
                let mut buf = Vec::new();

                let start = SystemTime::now();
                row.get_text(1, &mut buf).unwrap();
                let ret = String::from_utf8(buf.to_vec()).unwrap();

                println!("\nRow Fetch: {:?}", start.elapsed(),);

                do_something_with_the_result(&ret);
            });
        } else {
            break;
        }
    }

This gives error as expected because I have no idea how to implement:

future cannot be sent between threads safely
the trait `std::marker::Sync` is not implemented for `*mut odbc_api::odbc_sys::Dbc`

pacman82 commented 1 year ago

No, each cursor must be consumed sequentially. You can however fetch multiple rows at once. This is what bulk cursors are for. Since your cells seem to be all text, TextRowSet is your best friend here. This helps you eliminate IO overhead per row. Please note that your payload in the fields is quite large. So it is unclear if your times are dominated by IO overhead. Much more likely they are acutally already dominated by the payload of the rows themselfes. It just takes a network some amount of time to transport that amount of data.

Also note if you really have a lot of large rows and want to fetch them all at once, you also need to have enough memory to so. The allocation of the TextRowSet buffer makes this requirement obvious.

Best, Markus

prisma / tiberius

Query Performance #294

tiberius

odbc-api