launchbadge / sqlx

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, and SQLite.
Apache License 2.0
13.31k stars 1.26k forks source link

SQLite: Spurrious SEGFAULTs while reading Strings #634

Closed fosskers closed 2 years ago

fosskers commented 4 years ago

Hi there, I'm running 0.4.0-beta.1 (actually, this branch of it to fix the CPU issue). I've been doing some throughput testing, but have been getting a decent amount of segfaults. Sometimes these just crash the server without giving any trace, and sometimes they give messages like this:

thread 'async-std/runtime' panicked at 'byte index 3 is not a char boundary; it is inside '\u{1}' (bytes 2..3) of `ëa long piece of string data that hopefully won't be copied around on the heap so much!`', /home/colin/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/str/mod.rs:1920:47

The contents of my DB string fields are definitely ASCII, so I'm not sure what's happening here.

My code is available here, where I'm combining Tide and sqlx.

The single row in my database is just:

sqlite> select * from test;
John|40|This is a long piece of string data that hopefully won't be copied around on the heap so much!|2|2020-08-07T12:00:00Z|
Patryk27 commented 4 years ago

While I haven't been able to reproduce this bug locally, while looking through sqlx's code I did find a code that seemed pretty unsound to me:

// src/sqlite/value.rs:88

impl<'r> SqliteValueRef<'r> {
    pub(super) fn text(&self) -> Result<&'r str, BoxDynError> {
        match self.0 {
            SqliteValueData::Statement {
                statement, index, ..
            } => statement.column_text(index),

            SqliteValueData::Value(v) => v.text(),
        }
    }
}

... this code invokes column_text(), which underneath goes to:

// src/sqlite/statement/handle.rs:263

impl StatementHandle {
    pub(crate) fn column_blob(&self, index: usize) -> &[u8] {
        /* ... */

        let ptr = unsafe { sqlite3_column_blob(self.0.as_ptr(), index) } as *const u8;

        /* ... */
    }
}

The catch here lays in the fact that, according to SQLite's docs, the pointer returned from sqlite3_column_blob() might be invalidated when we call sqlite3_step() - and since StatementHandle implements Copy, its lifetime (and thus especially the 'r lifetime for SqliteValueRef) is totally bogus and doesn't encapsulate the lifetime of blob / of text correctly.

fosskers commented 4 years ago

I'm not sure if this is a load issue or what, since you'd think "reading strings causes segfaults" would have been noticed by now ;) I'm smacking my little toy server as hard as I can with vegeta.

mehcode commented 4 years ago

I'm reasonably sure how we handle sqlite3_column_blob is correct. Read here: https://github.com/launchbadge/sqlx/blob/master/sqlx-core/src/sqlite/row.rs#L17

The lifetime 'r on the ValueRef refers to the lifetime of the row, not the connection. And if a second row is created, the first should use sqlite3_value_dup.


I appreciate the minimal example you provided. I'll see if I can't dig into that soon.

Patryk27 commented 4 years ago

@mehcode I'm quite sure lifetimes are wrong though (or at least inflating isn't invoked sufficiently enough) - here's a simple PoC showing this bug:

use sqlx::{Connection, Decode, Row};
use tokio::stream::StreamExt;

const TEXT: &str = "
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor \
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis \
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. \
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore \
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt \
in culpa qui officia deserunt mollit anim id est laborum.
";

#[tokio::main]
async fn main() {
    let mut db = sqlx::SqliteConnection::connect(":memory:")
        .await
        .unwrap();

    sqlx::query("CREATE TABLE logs (log TEXT NOT NULL)")
        .execute(&mut db)
        .await
        .unwrap();

    sqlx::query(&format!("INSERT INTO logs VALUES ('{}')", TEXT))
        .execute(&mut db)
        .await
        .unwrap();

    let mut rows_a = sqlx::query("SELECT * FROM logs").fetch(&mut db);
    let row1 = rows_a.try_next().await.unwrap().unwrap();
    let value1 = row1.try_get_raw(0).unwrap();
    let value1: &str = Decode::decode(value1).unwrap();

    drop(rows_a);

    let mut rows_b = sqlx::query("SELECT * FROM logs").fetch(&mut db);
    let row2 = rows_b.try_next().await.unwrap().unwrap();
    let value2 = row2.try_get_raw(0).unwrap();
    let value2: &str = Decode::decode(value2).unwrap();

    println!("{}", value1);
    println!("{}", value2);
}

In my case, it randomly prints something like:

 adipisc8Vpor incididu,V (p Ut oVoV (paliquip mmodo consequat. Duis aute irure dolor i"(pin voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Edit: the bug is gone when you execute two different queries though - following code doesn't trigger the use-after-free (I think) anymore:

/* ... */
let mut rows_a = sqlx::query("SELECT * FROM logs WHERE 1=1").fetch(&mut db);
/* ... */
let mut rows_b = sqlx::query("SELECT * FROM logs WHERE 2=2").fetch(&mut db);
/* ... */
mehcode commented 4 years ago

You're not wrong, but you're also showcasing a different bug there. The Decode<Sqlite> impl for &str does allow one to produce an unsound lifetime. The linked code in the OP is not using that impl and thus shouldn't have the same issue.


The real problem here is a lot of what's done in SQLite here is an experiment to remove the lifetime on Row, which we had in 0.3, which kept it sound easily. The problem with that approach is it broke the type system because Rust has neither GATs nor lazy normalization.

Fixing this by threading the lifetime correctly in the current design requires a nested HRTB with type parameters.

Your example is easily "fixed" by removing the unsound impl of Decode but the example in OP shows there is a root problem elsewhere, under load.

Patryk27 commented 4 years ago

Okie, got it - just thought it might be related :-)

mehcode commented 4 years ago

Definitely related, if only tangentially. I'm just providing context.

Honestly, looking into this issue makes me think we really need to figure out how to have Row<'c> ('c being the connection) - without breaking the type system.

abonander commented 2 years ago

Do you mind trying the changes in #1551 to see if this segfault still happens? I got rid of the use of sqlite3_column_blob() there.