thelastpickle / cassandra-medusa

Apache Cassandra Backup and Restore Tool
Apache License 2.0
264 stars 143 forks source link

Provide web api for medusa #88

Open ANeumann82 opened 4 years ago

ANeumann82 commented 4 years ago

Project board link

I'm currently in the process of developing an Kubernets Operator for Cassandra, including the option for backup and restore, and Medusa seems like a good fit for that.

The current approach is to run medusa in a container alongside cassandra, and trigger the backup in there. This works fine so far, although medusa isn't really designed to be run this way.

In my current build, I have written a very thin layer around the cli that starts a web server and let's a user start the medusa commands with HTTP requests, as this is a more Kubernetes-Style approach than to SSH or kubeexec into the container and run the commands there.

Is this something that could be integrated here upstream? I'd like to get some feedback if it makes sense for me to put a bit more work into it, write a more complete design proposal and implement it for the rest of the CLI commands and open a PR for this, or if this something that's not going to be integrated anyway.

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: MED-87

jsanda commented 4 years ago

The topic of Kubernetes integration has come up before in #24. I shared some detailed thoughts on it in this comment.

I'd like to get some feedback if it makes sense for me to put a bit more work into it, write a more complete design proposal and implement it for the rest of the CLI commands and open a PR for this, or if this something that's not going to be integrated anyway.

I would be very happy to provide feedback and to collaborate on making Medusa a good solution for Kubernetes-based deployments.

If Medusa has an API over HTTP, is the idea that Medusa would run in a sidecar container along Cassandra and that the operator would trigger backups and restores via the HTTP API?

An HTTP API definitely makes sense especially from the perspective of writing an operator, but what running Medusa as a k8s Job? There could be separate jobs for backups and for restores. It would not require an HTTP API; however, there are still changes needed in Medusa in order to run it effectively in k8s.

There is another point worth noting about running Medusa in a Job. I think doing so would make it easier to use Medusa for Cassandra clusters that are not created and managed by an operator. I think an operator could also take full advantage of those jobs. While there are several C operators, there is no de facto standard, and I am going to guess that the majority of folks who run C in k8s do not use any operator for C*.

Have you used Medusa in k8s? If so, I am curious to know how you handle stopping Cassandra during restores (see my write up in #24 for some background).

Have you looked into using CSI? (I ask since it provides a backup/restore API for persistent volumes)

ANeumann82 commented 4 years ago

Yes, the idea is to run Medusa in a sidecar container for backups, and probably as an init-container for restores - as you've correctly mentioned stopping cassandra for restore is kind of problematic. Maybe I come up with a better solution when I tackle the restore side of things.

Running Medusa as a Job was my first idea as well, and I still use a Job to trigger the backup inside the sidecar-container. The problem with a Job - or rather anything besides a sidecar container - is that to access the data storage of the C* pod, the PV has to be available as ReadWriteMany - and a lot of storage solutions do not provide this.

I agree that there's probably more work neccessary to make Medusa really fit for k8s, but an HTTP API as an addition to the CLI interface might be a good first step.

Haven't looked into CSI, going to do that on monday.

jsanda commented 4 years ago

What would the APIs look like, at high level? I am curious considering back and restore can be long running operations.

jsanda commented 4 years ago

I am unassigning myself at least for now because I am instead working on #137.