readysettech / readyset

Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
https://readyset.io
Other
4.27k stars 117 forks source link

Add a new Readyset command to snapshot a new table #1369

Open altmannmarcelo opened 1 week ago

altmannmarcelo commented 1 week ago

Description

In case someone is using replication filters to select which tables to snapshot, it's required to bounce the instance in order to add a new replicated table.

We should create a new command to allow for adding a new table to the replicated tables.

Change in user-visible behavior

Requires documentation change

altmannmarcelo commented 3 days ago

I don't agree - We should give the users the ability to select which tables they want to snapshot and later add new tables while also allow for snapshot everythig. On your Idea, if I know that I won't use a table that has TB of data, why should we snapshot it first to later discard?

davisjc commented 3 days ago

I agree we should avoid snapshotting a table if we'll only then discard it later. I suppose I'd want an easy way to specify the blacklist in advance.

The whole problem seems very similar to the --replication-tables and --replication-tables-ignore arguments, which are all about framing this as either a whitelist or a blacklist.

Ignoring the current particulars of how ReadySet does snapshotting, the blacklisting mentality makes a lot of sense to me (typically presume everything is replicated with a handful of exceptions, which are blacklisted).

Explicitly whitelisting the tables you want replicated could make sense in some situations too, but I don't expect that direction to be as common or desirable. It seems like another operational step customers must do whenever they add a new table to their application.

I'm not sure what I'm proposing yet, so I'm thinking out loud here, but maybe it would make sense to have a way to start ReadySet for the first time (without implicitly also starting replication), examine the upstream tables, define a blacklist that makes sense, and then tell ReadySet to start snapshotting/replicating.

After the initial setup, I'd expect we'd want ReadySet to continue snapshotting and replication by default on subsequent process launches.

altmannmarcelo commented 3 days ago

We need to allow for both use cases:

After Readyset has started, I want to add a new table to either one of those lists - We should have a command to accomplish this. That is what this ticket is about.

davisjc commented 3 days ago

Those 2 use cases make sense, and for the first one where we're only replicating a handpicked 10, I think running a command to add the table makes sense.

For the second use case, where we're replicating everything but a handpicked 10, I think it would be unfortunate if the user had to manually add this new table (after the first 990 were implicitly chosen for replication).

altmannmarcelo commented 1 day ago

For the second use case, where we're replicating everything but a handpicked 10, I think it would be unfortunate if the user had to manually add this new table (after the first 990 were implicitly chosen for replication).

This will automatically be added to Readyset when replicators see the DDL for the new table and it does not match the --replicate-tables-ignore. That is how the filtering works currently.