Database thing statistics

Usage and product changes

We implement the architecture and most of the implementation required for tracking data statistics. The statistics will primarily be used for query planning. We achieved several design goals:

1) Not scanning the entire storage to update statistics 2) Allowing access to old versions statistics, which allows even very old time travel/MVCC usage without too much performance degradation 3) Not writing statistics to the storage layer in RocksDB, since we can degrade performance by updating the statistics keys on every transcation and statistics can/should be a primarily in-memory structure.

However, we take the trade off that the statistics are not always up-to-date. The update frequency is parameter we can optimise.

In the end, there is a single database-wide statistics struct, which is immutable and updated periodically. We update it by scanning the data WAL records and summing the count deltas since, and then replacing the Statistics struct held by the database atomically. The statistics are checkpointed into the WAL, which also allows us to time-travel to older snapshots and find a relatively accurate statistics entry from near that version.

This means we are solidifying the requirement that WAL cleaning, and MVCC compaction are tied to the same time-scale - both are required for going back to previous data versions correctly.

Future work We could find that reading from the WAL to update statistics is a bottleneck. We can solve several problems at once, by extracting the commit data "cache" from the IsolationManager into the DurabilityClient, which can then be shared across isolation and statistics operations.

Implementation

Architecture

Promote WAL and Checkpoint management into Database
- we move Checkpoint and associated commit_replay methods into a new module: //storage/recovery
- we also add an (ultimately unused) system to 'extend' a checkpoint with additional data
Split database creation and loading into two separate entry points, and rearrange the corresponding methods all the way down into Storage, WAL, and Checkpointing. We also update corresponding tests.
Given a database directory, each module creates its own subdirectory:
- MVCCStorage creates db-name/storage
- WAL creates db-name/wal
- Checkpoint creates db-name/checkpoint
Introduce //concept/thing/Statistics, which stores thing statistics for instances of each type, role playing, and relation indexing, etc.
We checkpoint statistics into the WAL using a new record type, and load the last one on bootup. This also helps solve the MVCC time-travel problem, where going back in time could lead to mismatched statistics being used (or even not relevant for the different schema!). We probably allow using the existing Statistics if the "old" sequence number being opened is no more than N (~100) versions behind the statistics version.

This means we are solidifying the requirement that WAL cleaning, and MVCC compaction are tied to the same time-scale - both are required for going back to previous data versions correctly.

Statistics catch-up/synchronisation is implemented by reading data commit records from the WAL.
- For this we re-creatte write snapshots from commit data read from disk. However, this constructor intentionally returns a narrower API which means we cannot write or commit to a re-created write snapshots.
- We add CommitType to the CommitRecord generated by CommittableSnapshots. This allows deserialising and recreating the correct type of snapshot (data or schema) from a WAL entry.
Refactor out the //storage/durability module into //durability package, which contains a simplified Service trait
- We then create the DurabilityClient trait, which is now used throughout the code base wherever DurabilityService was used before
- The intent is to allow extracting Durability into remote machine(s) using a Calvin-style partitioned WAL, if we wanted to. We will use the client to communicate with a set of durability servers, and manage collecting ordering information, etc.
- For now, we only have a WALClient which wraps a WAL but conforms to the DurabilityClient trait

UX

We create a more consistent/comprehensive error structure for what happens if any of storage/wal/checkpoint are not present on bootup.

The presence or absence of the storage directory is irrelevant to bootup/recovery (same path). Being present simply optimises the recovery process since we have to copy fewer files from the checkpoints.

If the WAL is present on bootup, but no checkpoint is provided, we replay the WAL from scratch
If the WAL is present on bootup, and a checkpoint is provided, we replace the storage with the checkpoint and replay the WAL since the checkpoint
If the db-name database directory is present, but no WAL is present, this is an error state.
If the WAL directory is present, but for any reason data is required from the WAL is not present (for example, deleted or cleaned up) this is an error. This could happen when replaying the WAL from the start when the checkpoint is absent, or when a checkpoint is provided and the required replay point is not available in the WAL.

vaticle / typedb