microsoft / garnet

Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication features. Garnet can work with existing Redis clients.
https://microsoft.github.io/garnet/
MIT License
9.71k stars 459 forks source link

Allow Write Operations when Slot is in MIGRATING state #474

Closed vazois closed 1 day ago

vazois commented 1 week ago

This PR resolves issue #354.

Tasks:

The process of slot migration involves several stages which are designed to manage data access, ensuring both data integrity and high availability as the corresponding keys are being transferred from the source node to the target node. The MIGRATE command supports two transfer options, specifically MIGRATE KEYS and MIGRATE SLOTS. Both options use a common data access control interface, that is logically divided into the following categories:

  1. Slot level access control Used to orchestrate migration by changing the state of the associated slot accordingly (i.e. MIGRATING, IMPORTING) at the corresponding source and target nodes
  2. Key level access control Used to control access to individual keys in order to ensure high availability and data integrity.

Key level access control

Each migrate session maintains a dictionary of <keys, KeyMigrationStatus> pairs. This dictionary is used to control access to keys that are actively being managed by a single running migrate session. When a slot is in the process of migration and there are no active migration sessions managing a specific key, any requests for that key are handled under the assumption that the key exists. If it doesn’t, a redirect -ASK is created, which points to the endpoint of the target node.

The KeyMigrationStatus will affect individual session readers and writers as follows:

Key Transfer State Machine Algorithm

  1. Add keys to dictionary and initialize KeyMigrationStatus to QUEUED
  2. Perform for main and object store separately
    1. Transition keys from QUEUED to MIGRATING state
    2. Await for status change propagation using epoch protection.
    3. For every key in MIGRATING state perform the following:
      1. Lookup for key at given store.
      2. If key is found send it (do it in batch) to the target node.
      3. If key is not found change state back to QUEUED to unblock any writers.
    4. If copy option is disabled perform the following:
      1. Transition keys from MIGRATING to DELETING.
      2. Await for status change propagation.
      3. For every key in DELETING state, delete it and change its state to MIGRATED.
    5. If copy option is enabled transition keys from MIGRATING to MIGRATED state.
graph TD;
    QD-->MG;
    MG-->DL;
    MG-->QD;
    MG-->MD;
    DL-->MD;

RespClusterBench - Slot in STABLE state

Main

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
Get .NET 6 Empty .NET 6.0 28.82 us 0.078 us 0.069 us -
Set .NET 6 Empty .NET 6.0 34.55 us 0.053 us 0.047 us -
MGet .NET 6 Empty .NET 6.0 25.68 us 0.018 us 0.017 us -
MSet .NET 6 Empty .NET 6.0 26.02 us 0.033 us 0.031 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 21.71 us 0.043 us 0.038 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 20.69 us 0.037 us 0.035 us -
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 19.08 us 0.049 us 0.046 us -
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 19.02 us 0.011 us 0.010 us -

PR #474

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
Get .NET 6 Empty .NET 6.0 28.14 us 0.019 us 0.017 us -
Set .NET 6 Empty .NET 6.0 29.66 us 0.020 us 0.017 us -
MGet .NET 6 Empty .NET 6.0 24.48 us 0.049 us 0.045 us -
MSet .NET 6 Empty .NET 6.0 28.09 us 0.244 us 0.228 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 21.16 us 0.011 us 0.009 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 20.34 us 0.012 us 0.011 us -
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 18.24 us 0.045 us 0.042 us -
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 17.25 us 0.044 us 0.041 us -

Diff (%)

Method Job EnvironmentVariables Runtime Mean
Get .NET 6 Empty .NET 6.0 2.36 %
Set .NET 6 Empty .NET 6.0 14.15 %
MGet .NET 6 Empty .NET 6.0 4.67 %
MSet .NET 6 Empty .NET 6.0 -7.96 %
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 2.53 %
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 1.69 %
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 4.4 %
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 9.31 %

RespClusterMigrateBench - Slot in MIGRATING state

Main

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
Get .NET 6 Empty .NET 6.0 49.16 us 0.222 us 0.208 us -
Set .NET 6 Empty .NET 6.0 60.81 us 0.688 us 0.643 us -
MGet .NET 6 Empty .NET 6.0 47.23 us 0.175 us 0.164 us -
MSet .NET 6 Empty .NET 6.0 49.34 us 0.197 us 0.184 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 39.66 us 0.150 us 0.140 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 39.40 us 0.020 us 0.019 us -
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 36.54 us 0.017 us 0.015 us -
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 38.56 us 0.086 us 0.080 us -

PR #474

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
Get .NET 6 Empty .NET 6.0 52.76 us 0.162 us 0.151 us -
Set .NET 6 Empty .NET 6.0 64.02 us 0.277 us 0.259 us -
MGet .NET 6 Empty .NET 6.0 47.87 us 0.151 us 0.141 us -
MSet .NET 6 Empty .NET 6.0 49.47 us 0.187 us 0.175 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 40.23 us 0.025 us 0.020 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 41.76 us 0.028 us 0.026 us -
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 37.38 us 0.109 us 0.097 us -
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 36.85 us 0.037 us 0.033 us -

Diff (%)

Method Job EnvironmentVariables Runtime Mean
Get .NET 6 Empty .NET 6.0 -7.32 %
Set .NET 6 Empty .NET 6.0 -5.28 %
MGet .NET 6 Empty .NET 6.0 -1.36 %
MSet .NET 6 Empty .NET 6.0 -0.26 %
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 -1.44 %
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 -5.99 %
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 -2.3 %
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 4.43 %