readysettech / readyset

Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
https://readyset.io
Other
4.54k stars 125 forks source link

Create upgrade mechanism to fix-up underling data #1359

Open altmannmarcelo opened 2 months ago

altmannmarcelo commented 2 months ago

Description

Sometimes we will find and fix issues that will fix how we store data. One example is the recent issue with DATE field, where we were missing to set the date_only bit on TimestampTZ type.

Customer that have already snapshotted (let's say TB+ of data) should not require to re-snapshot and we should be able have a sort of upgrade script that knows what has to be fixed depending on the current data, version.

I think we have something like the serde version that we store, but that might not be the exact field/version we want as we are not necessary changing how we store data internally.

We might want to add a new metadata field, such as Data-dictionary version and bump this version every-time we do a change that requires upgrade of stored data. With this information we should be able to figure out which change we need to apply.

Example:

enum DdVersion {
    Version0 = 0, // Initial version
    Version1, // Fixed the issue with missing date_only in TimestampTz
    Version2, // Fixed something else
}

let dd_version = DdVersion::Version2;

Once we start Readyset we can check the stored DdVersion in the metadata rocksdb and compare with dd_version of the running binary. Here we can choose to run the necessary upgrades (which will be defined on a case by cases basis) or ask the user to explicitly run readyset with --upgrade flag.

Change in user-visible behavior

Requires documentation change