microsoft / garnet

Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication features. Garnet can work with existing Redis clients.
https://microsoft.github.io/garnet/
MIT License
9.71k stars 459 forks source link

Issuing "CLUSTER FORGET importingNodeId" to a Migrating Node Mid-Migration Causes Slot to Point to Another Node #494

Closed priyanjgupta closed 1 day ago

priyanjgupta commented 3 days ago

Describe the bug

When migrating slots from one node to another, if an issue arises that necessitates resetting the importing node mid-migration, issuing a Cluster forget to the migrating node does not reset the state of the migrating slot. Instead, it causes the migrating slot to point to some another node in the cluster, this impacts Write Operations to the migrating slot

image

As a result, new keys can't be written to the migrating slot. image

Steps to reproduce the bug

Suppose we have a cluster with NodeA: 0-16383, NodeB, and NodeC. To reproduce the bug follow these steps:

  1. Node A: CLUSTER SETSLOT X IMPORTING nodeidB
  2. Node B: CLUSTER SETSLOT X MIGRATING nodeidA
  3. Issue CLUSTER RESET HARD to NodeA
  4. Node C: CLUSTER FORGET nodeidA
  5. Node B: CLUSTER NODES
  6. Node B: CLUSTER FORGET nodeidA
  7. Node B: CLUSTER NODES

The results for CLUSTER NODES at steps 5 and 7 are different. Ideally, the state of slot X should have been reset.

Expected behavior

Whenever we issue CLUSTER FORGET targetnodeid to a node that is migrating or importing a slot to the target node, it should reset the state of the slot.

For context: image

Screenshots

No response

Release version

No response

IDE

No response

OS version

No response

Additional context

No response