Open tiagolobocastro opened 1 week ago
Looks like there was a heartbeat failure, which caused control-plane to mark the node as offline. In turn, this means we didn't set the nexus node shutdown request. Nonetheless, we should not have attempted to destroy the replica, because the nexus was not verified as shutdown! Also we could have a check on the dataplane to avoid getting into trouble by ensuring replica is not being used?
Pool lock was taken and never released
The DestroyReplica call is the one getting starved of lock, but who is holding the pool lock here?
Pool lock was taken and never released
The DestroyReplica call is the one getting starved of lock, but who is holding the pool lock here?
The first DestroyReplica
call
This ticket needs two fixes:
Control-plane changes: https://github.com/openebs/mayastor-control-plane/pull/862
Describe the bug Pool lock was taken and never released. This means all grpc for that pool will fail!
To Reproduce Seems like this may happen if we try to delete a replica which is part of a nexus!
Expected behavior Don't lock the pool forever...
Additional context
This was found on another report: #1734