phiresky / sqlite-zstd

Transparent dictionary-based row-level compression for SQLite
GNU Lesser General Public License v3.0
1.4k stars 47 forks source link

sqlite-zstd

Extension for sqlite that provides transparent dictionary-based row-level compression for sqlite. This basically allows you to compress entries in a sqlite database almost as well as if you were compressing the whole DB file, but while retaining random access.

See also the announcement blog post for some motivation, benchmarks and ramblings: https://phiresky.github.io/blog/2022/sqlite-zstd

size comparison chart

Depending on the data, this can reduce the size of the database by 80% while keeping performance mostly the same (or even improving it, since the data to be read from disk is smaller).

Note that a compression VFS such as https://github.com/mlin/sqlite_zstd_vfs might be suited better depending on the use case. That has very different tradeoffs and capabilities, but the end result is similar.

Transparent Compression

Basic Functionality

Compiling

This project can be built in two modes: (a) as a Rust library and (b) as a pure SQLite extension (with --features build_extension).

You can get the SQLite extension binaries from the GitHub releases. Alternatively, you can build the extension by hand:

cargo build --release --features build_extension
# should give you target/release/libsqlite_zstd.so

Cross Compiling

For cross-compiling to aarch64-linux-android, you need to

  1. Download the target we need to cross-compile

    rustup target add aarch64_linux_android
  2. Prepare the Android NDK (The binutils is deprecated and removed from NDK 23+, so you need to download an older version of NDK)

  3. Setup NDK binary path

    export PATH="$PATH:<NDK_DIR>/toolchains/llvm/prebuilt/linux-x86_64/bin"
  4. Specify linker in cargo configuration file

    [target.aarch64-linux-android]
    linker = "aarch64-linux-android23-clang"
  5. Specify target accordingly when building

    cargo build -r --features build_extension --target aarch64-linux-android

As a Python "extension"

If you want to use this project as an SQLite extension inside a Python project, you can install it as a Python package (you still need to have a rust compiler to actually build the binary):

pip install 'git+https://github.com/phiresky/sqlite-zstd.git#egg=sqlite_zstd&subdirectory=python'

This installs the extension as a Python package, with some support code to make it easy to use from Python code or Datasette.

Usage

You can either load this library as SQLite extension or as a Rust library. Note that sqlite extensions are not persistent, so you need to load it each time you connect to the database.

Is this library production ready?

I wouldn't trust it with my data (yet). Make sure you have backups of everything. I'm also not making any guarantees for backwards compatibility of future updates, though migrating by copying over the uncompressed data should of course work fine.

Sqlite CLI

Either load it in the REPL:

$ sqlite3 file.db
SQLite version 3.34.0 2020-12-01 16:14:00
sqlite> .load .../libsqlite_zstd.so
[2020-12-23T21:30:02Z INFO  sqlite_zstd::create_extension] [sqlite-zstd] initialized
sqlite>

Or alternatively:

sqlite3 -cmd '.load libsqlite_zstd.so' 'select * from foo'

C Api

int success = sqlite3_load_extension(db, "libsqlite_zstd.so", NULL, NULL);

See here for more information.

Rust

The recommended method is to add sqlite_zstd as a dependency to your project, then load it using

let conn: rusqlite::Connection;
sqlite_zstd::load(&conn)?;

Alternatively, you can load the extension like any other extension:

let conn: rusqlite::Connection;
conn.load_extension("libsqlite_zstd.so", None)?;

See here for more information.

Python

If you have installed this as a Python module as described above, you can load the extension into an existion SQLite connection like this:

import sqlite3
import sqlite_zstd

conn = sqlite3.connect(':memory:')
sqlite_zstd.load(conn)

When using Datasette, this extension is loaded automatically into every connection.

Verbosity / Debugging

You can change the log level by setting the environment variable SQLITE_ZSTD_LOG=error for less logging and SQLITE_ZSTD_LOG=debug for more logging.

Future Work / Ideas / Todo