tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.56k stars 252 forks source link

Verify bottomless WAL checksumming algorithm #598

Open psarna opened 10 months ago

psarna commented 10 months ago

Description of the original SQLite WAL checksumming algorithm: https://www.sqlite.org/fileformat.html#checksum_algorithm

Now, our checksumming implementation is here: https://github.com/tursodatabase/libsql/blob/508ee178007106f9862172ba894500b476b6da85/bottomless/src/wal.rs#L244

Two things I find confusing:

  1. The original algorithm computes the checksum either in little-endian or big-endian, depending on a value from the WAL header. I don't see any such distinction in our implementation, so maybe we have an assumption that doesn't always hold?
  2. The function above is named checksum_be, but I don't see how it uses big-endian and not just host endianness. Can somebody clarify?
psarna commented 10 months ago

I just saw bottomless-cli failed to restore a backup created with:

turso db shell http://localhost:8080/ 'create table t(id);'
for x in {1..32}; do turso db shell http://localhost:8080/ 'insert into t values (random());'; done

It has 30-something WAL frames backed up, and fails to verify the checksums. However, with LIBSQL_BOTTOMLESS_VERIFY_CRC=false the verification passes, and matches the local db contents.