k8ssandra / medusa-operator

A Kubernetes operator for managing Cassandra backups/restores with Medusa
Apache License 2.0
9 stars 7 forks source link

Backup can succeed without being marked as finished #57

Closed adejanovski closed 2 years ago

adejanovski commented 3 years ago

Some backups never get marked as finished for unclear reasons. Looking at the code, it appears that the doBackup() gRPC call is a blocking one running in a goroutine. Some backups can last for many hours, making it unreliable to rely on blocking http calls. Even running the backup and checking the status of the backup in the storage bucket (using medusa status for example), would not be reliable as it would detect successful backups but not failed ones (which look like running ones to Medusa). Instead, we'd need to make the doBackup() call a short operation which starts a thread running the actual backup. Another gRPC operation should be created to check the state of the thread, allowing to monitor the backup operation in an async fashion. The Medusa parts of this are captured in this issue.

┆Issue is synchronized with this Jira Bug by Unito ┆Affected Versions: k8ssandra-1.2.0,k8ssandra-1.3.0 ┆Epic: Remote Cluster Restore ┆Issue Number: K8SSAND-624 ┆Priority: Medium

sync-by-unito[bot] commented 3 years ago

➤ Jeff DiNoto commented:

Open question to be looked at: Can we restore from a backup that gets into this state?

jsanda commented 3 years ago

@jdonenine yes we can restore from a backup that gets into the state. The restore controller should be up front validation, but it currently does not do anything like checking that the backup has completed successfully.

adejanovski commented 2 years ago

Issue moved to k8ssandra/k8ssandra-operator #633 via ZenHub