launchbadge / sqlx

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, and SQLite.
Apache License 2.0
13.25k stars 1.26k forks source link

[SQLite] Quick review #296

Closed gwenn closed 4 years ago

gwenn commented 4 years ago

https://www.sqlite.org/version3.html

SQLite is not particular about the text it receives and is more than happy to process text strings that are not normalized or even well-formed UTF-8 or UTF-16. Thus, programmers who want to store IS08859 data can do so using the UTF-8 interfaces.

mehcode commented 4 years ago

:wave: Really appreciate you taking the time to review the implementation here.


Text value can contains '\0' and have any encoding

You're absolutely right. We should not use unchecked there but properly die if it's not valid UTF-8. The user can select as a blob if they want to process non-UTF-8 in Rust.

Columns cannot be cached

I do see your point here; however, this would only trigger if the user does two very not-recommended actions. The user would need to interleave a DDL command with a prepared SELECT * query.

I'm of the opinion that we carefully document we simply don't support this. This affects all database drivers. A data point here is the Java driver for PostgreSQL caches column names and thus has a strange error with SELECT * and DDL. They simply recommend against doing SELECT *.

With that said, it's quite unfortunate there isn't a way to get the newer prepare interface of sqlite to not auto-recompile on schema changes as that would be a nice "free" way to support this for SQLite.

WAL mode limitation

I did miss that. In v0.4 we'll be moving to proper connection string handling and I aim to handle more configuration options there. We should probably just expose journal_mode as a parameter and document "you should probably set this to 'WAL' unless reasons".


Edit: More, on column caching. Perhaps an explicit way to "flush" these caches would be enough for the developer that wants to use DDL and SELECT * in the dynamic API.

emilazy commented 4 years ago

We should not use unchecked there but properly die if it's not valid UTF-8. The user can select as a blob if they want to process non-UTF-8 in Rust.

Note that the issue isn't just the panic, but that valid UTF-8 text can contain NUL bytes, which SQLite and Rust both support but the sqlite3_column_text interface doesn't. From my understanding of the docs, the correct thing to would be to use sqlite3_column_bytes even for TEXT and avoid going through CStr.

mehcode commented 4 years ago

@emilazy You make a good point. I'll make that change.

mehcode commented 4 years ago

@emilazy I was adding this in and I noticed that in encoding it's critical that sqlite3_bind_text is used and not sqlite3_bind_blob. However, sqlite3_column_blob is significantly better for decoding.

The consequence of not using sqlite3_bind_text means that SELECT 'Hello' = ? with a bound value of "Hello" from Rust will be FALSE from SQL. This is because (and I'm guessing here, but I'm probably right) of the transparent null terminator from the text literal and SQLite comparing bytes and text using a memcpy and not specializing that for text.

emilazy commented 4 years ago

Makes sense. It seems like sqlite3_bind_text accepts a length parameter anyway, so hopefully it works with strings with embedded NUL characters? But given C's general misfeatures around strings I wouldn't be surprised if SQLite has edge cases here.

mehcode commented 4 years ago

I can confirm that embedded NULs do work with sqlite3_bind_text. A bit annoying to test as you can't have embedded NULs in a query string.

select cast(x'7468697320006973206E756C2D636F6E7461696E696E67' as text) = ?
sqlite3_bind_text( ... , "this \0is nul-containing")
alun commented 4 years ago

It looks like something wrong with using WAL in MacOS I don't see it's making enough checkpoints. I'd assume default value where checkpoint should be auto triggered is 1000 pages. Which is probably 4M, while I'm seeing it growing to 500M:

-rw-r--r--   1 alunacharskii  staff     585728  7 Jun 10:38 trades.db
-rw-r--r--   1 alunacharskii  staff     950272  7 Jun 12:06 trades.db-shm
-rw-r--r--   1 alunacharskii  staff  487000512  7 Jun 12:07 trades.db-wal

Maybe we could have added https://www.sqlite.org/c3ref/wal_checkpoint_v2.html to the interface, so I can run it manually?

UPD: actually I've tried running PRAGMA wal_checkpoint(FULL) which supposed to do the same as the C function but it didn't help.

UPD2: PRAGMA wal_checkpoint(TRUNCATE) helped

UPD3: fixed a mistake with not complete cursor evaluation https://github.com/launchbadge/sqlx/issues/326#issuecomment-640480158 (this caused checkpoints to not trigger) and added graceful shutdown with closing statically OnceCell-stored pool, now the WAL file works perfectly for me!

mehcode commented 4 years ago

Thanks again for the feedback. These are now all addressed on master.

After much thought, I decided to go with your recommendation and allow columns to change after a statement is prepared.