Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication features. Garnet can work with existing Redis clients.
When migrating slots from one node to another, if an issue arises that necessitates resetting the importing node mid-migration, issuing a Cluster forget to the migrating node does not reset the state of the migrating slot. Instead, it causes the migrating slot to point to some another node in the cluster, this impacts Write Operations to the migrating slot
As a result, new keys can't be written to the migrating slot.
Steps to reproduce the bug
Suppose we have a cluster with NodeA: 0-16383, NodeB, and NodeC. To reproduce the bug follow these steps:
Node A: CLUSTER SETSLOT X IMPORTING nodeidB
Node B: CLUSTER SETSLOT X MIGRATING nodeidA
Issue CLUSTER RESET HARD to NodeA
Node C: CLUSTER FORGET nodeidA
Node B: CLUSTER NODES
Node B: CLUSTER FORGET nodeidA
Node B: CLUSTER NODES
The results for CLUSTER NODES at steps 5 and 7 are different. Ideally, the state of slot X should have been reset.
Expected behavior
Whenever we issue CLUSTER FORGET targetnodeid to a node that is migrating or importing a slot to the target node, it should reset the state of the slot.
Describe the bug
When migrating slots from one node to another, if an issue arises that necessitates resetting the importing node mid-migration, issuing a Cluster forget to the migrating node does not reset the state of the migrating slot. Instead, it causes the migrating slot to point to some another node in the cluster, this impacts Write Operations to the migrating slot
As a result, new keys can't be written to the migrating slot.![image](https://github.com/microsoft/garnet/assets/137490230/f41bec8b-3745-4e0f-b9d1-fd2d51bf6002)
Steps to reproduce the bug
Suppose we have a cluster with NodeA: 0-16383, NodeB, and NodeC. To reproduce the bug follow these steps:
CLUSTER SETSLOT X IMPORTING nodeidB
CLUSTER SETSLOT X MIGRATING nodeidA
CLUSTER RESET HARD
to NodeACLUSTER FORGET nodeidA
CLUSTER NODES
CLUSTER FORGET nodeidA
CLUSTER NODES
The results for
CLUSTER NODES
at steps 5 and 7 are different. Ideally, the state of slot X should have been reset.Expected behavior
Whenever we issue CLUSTER FORGET targetnodeid to a node that is migrating or importing a slot to the target node, it should reset the state of the slot.
For context:![image](https://github.com/microsoft/garnet/assets/137490230/53f1f6a1-d30d-4bbe-8ba6-91bf388d9673)
Screenshots
No response
Release version
No response
IDE
No response
OS version
No response
Additional context
No response