openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
742 stars 106 forks source link

Rebuilding/recovering a DiskPool from the device even if mayastor state is lost #1752

Open michaelbeaumont opened 3 weeks ago

michaelbeaumont commented 3 weeks ago

Is your feature request related to a problem? Please describe. What happens if I lose my entire Kubernetes and mayastor deployment, so mayastor etcd and k8s etcd. All my PV/PVCs gone (i.e. without being deleted via the API). It seems like if I recreate the cluster and create the same diskpool, the existing data on the device is recognized in some sense, I see a non-zero amount of used space. and I'll even get an import error if the name of the pool has changed. But what can I actually recover from this disk pool?

Issue created from slack thread

Describe the solution you'd like Somehow it should be possible to recover the data and ideally the PV/PVCs.

tiagolobocastro commented 2 weeks ago

There's a few steps that we'd need to take to make this possible. The biggest catch at the moment is that replica health is delegated entirely to etcd, so when recovering one would need to inspect each volume's replica and manually determine which one is the most up to date. To help tackle this we should also write out health information to each replica, for example using the metadata partition we have on each replica, or using replica attributes.