valkey-io / valkey

A flexible distributed key-value datastore that is optimized for caching and other realtime workloads.
https://valkey.io
Other
17.18k stars 645 forks source link

[NEW] Atomic slot migration HLD #23

Open PingXie opened 7 months ago

PingXie commented 7 months ago

Yet another gem that we should seriously consider for our new project. The original proposal (and the thread) is too long so I am going to leave it at https://github.com/redis/redis/issues/10933.

Note that this feature will be orthogonal to how cluster topology management is done.

PingXie commented 7 months ago

@madolson and I discussed another option (in the context of atomic slot migration) which essentially implements slot-level replication. Here is the high level flow

  1. source (parent) forks a child with a set of slots to migrate
  2. child streams slots to be migrated in the RDB format (same as replication)
  3. target needs to support no-blocking load of the RDB stream (think of co-routines)
  4. source (parent) starts capturing updates in the migrating slots right after forks
  5. child completes streaming
  6. source (parent) pauses clients writing to these slots
  7. source replicates captured updates to target
  8. source starts the slot ownership transfer process (depending on cluster v1 vs v2, we could take different paths)
  9. source unblocks paused clients with -MOVED <target>

any failure on the target before completing step 8 would abort the migrating process on the source.

zuiderkwast commented 7 months ago

I have an implementation. Didn't write it down yet, but I recorded a song which explains it all. Please review. https://suno.com/song/b06c2a5b-3760-4916-9f56-eb3fe66f24e2

PingXie commented 6 months ago

I have an implementation. Didn't write it down yet, but I recorded a song which explains it all. Please review. https://suno.com/song/b06c2a5b-3760-4916-9f56-eb3fe66f24e2

Are you referring to https://github.com/valkey-io/valkey/pull/298? I will take a look next but I don't see it solving this issue?

zuiderkwast commented 6 months ago

@PingXie The song is about atomic slot migration. #298 is about subscribing to slot migrations. Not the same.

PingXie commented 6 months ago

@PingXie The song is about atomic slot migration.

Ship-it