tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
9.54k stars 252 forks source link

bottomless: less bugs more robustness #1685

Open sivukhin opened 1 month ago

sivukhin commented 1 month ago

Context

There are several known issues with bottomless restore process:

  1. There is a bug in case when S3 has more than 1 page of data - in this case bottomless always stopped it's work after first page due to incorrect usage of last_received_frame_no var
  2. bottomless relies on the fact that last connection will perform checkpoint. This is true if DB is valid, but in case of malformed DB last connection will just exit silently and leave DB empty (4KB DB file and some data in WAL). Current implementation will ignore this situation and just restore empty DB

Changes

  1. Fixed bug with restore process from more than 1 page in S3
  2. Add validation that after drop of the last connection there will be no WAL files on the disk. In other case now bottomless will fail to restore because most probably DB were malformed
  3. Added BOTTOMLESS CAUTION prefix to all cases when bottomless can behave kind of fishy
  4. Added simple restore_from_partial_db test which drops several files from S3 and check that DB will be able to start from this partial backup
    • This is not immediately trivial why we need to restore in such cases - but as server can crash at any point of time and we are uploading frame ranges in parallel - this is a valid case that some small suffix of frame ranges can have a gap. So we can't just easily fail restore process because it will create troubles in "almost valid scenario"