nuvo / cain

Backup and restore tool for Cassandra on Kubernetes
Apache License 2.0
32 stars 21 forks source link
abs aws azure azure-blob-storage backup cain cassandra go golang helm kubernetes restore s3 schema snapshot

Release Travis branch Docker Pulls Go Report Card license

Cain

Cain is a backup and restore tool for Cassandra on Kubernetes. It is named after the DC Comics superhero Cassandra Cain.

Cain supports the following cloud storage services:

Cain is now an official part of the Helm incubator/cassandra chart!

Install

Prerequisites

  1. git
  2. dep

From a release

Download the latest release from the Releases page or use it with a Docker image

From source

mkdir -p $GOPATH/src/github.com/nuvo && cd $_
git clone https://github.com/nuvo/cain.git && cd cain
make

Commands

Backup Cassandra cluster to cloud storage

Cain performs a backup in the following way:

  1. Backup the keyspace schema (using cqlsh).
  2. Get backup data using nodetool snapshot - it creates a snapshot of the keyspace in all Cassandra pods in the given namespace (according to selector).
  3. Copy the files in parallel to cloud storage using Skbn - it copies the files to the specified dst, under namespace/<cassandrClusterName>/keyspace/<keyspaceSchemaHash>/tag/.
  4. Clear all snapshots.

Usage

$ cain backup --help
backup cassandra cluster to cloud storage

Usage:
  cain backup [flags]

Flags:
  -a, --authentication                     use authentication for nodetool and clqsh. Overrides $CAIN_AUTHENTICATION
  -b, --buffer-size float                  in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
      --cassandra-data-dir string          cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
  -u, --cassandra-username string          cassandra username. Overrides $CAIN_CASSANDRA_USERNAME (default "cain")
  -c, --container string                   container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
      --dst string                         destination to backup to. Example: s3://bucket/cassandra. Overrides $CAIN_DST
  -h, --help                               help for backup
  -k, --keyspace string                    keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string                   namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
      --nodetool-credentials-file string   path to nodetool credentials file. Overrides $CAIN_NODETOOL_CREDENTIALS_FILE (default "/home/cassandra/.nodetool/credentials")
  -p, --parallel int                       number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
  -m, --s3-max-upload-parts int            maximum number of parts to upload in parallel for s3 multipart upload. Overrides $CAIN_S3_MAX_UPLOAD_PARTS (default 10000)
  -s, --s3-part-size int                   size of each part in bytes for s3 multipart upload. Overrides $CAIN_S3_PART_SIZE (default 134217728)
  -l, --selector string                    selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")

Examples

Backup to AWS S3

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst s3://db-backup/cassandra

Backup to AWS S3 with Cassandra authentication enabled

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst s3://db-backup/cassandra
    -a
    -u cain
    --nodetool-credentials-file /home/cassandra/.nodetool/credentials

Backup to Azure Blob Storage

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst abs://my-account/db-backup-container/cassandra

Restore Cassandra backup from cloud storage

Cain performs a restore in the following way:

  1. Restore schema if schema is specified.
  2. Truncate all tables in keyspace.
  3. Copy files from the specified src (under keyspace/<keyspaceSchemaHash>/tag/) - restore is only possible for the same keyspace schema.
  4. Load new data using nodetool refresh.

Usage

$ cain restore --help
restore cassandra cluster from cloud storage

Usage:
  cain restore [flags]

Flags:
  -a, --authentication                     use authentication for nodetool and clqsh. Overrides $CAIN_AUTHENTICATION
  -b, --buffer-size float                  in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
      --cassandra-data-dir string          cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
  -u, --cassandra-username string          cassandra username. Overrides $CAIN_CASSANDRA_USERNAME (default "cain")
  -c, --container string                   container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
  -h, --help                               help for restore
  -k, --keyspace string                    keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string                   namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
  -f, --nodetool-credentials-file string   path to nodetool credentials file. Overrides $CAIN_NODETOOL_CREDENTIALS_FILE (default "/home/cassandra/.nodetool/credentials")
  -p, --parallel int                       number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
  -s, --schema string                      schema version to restore (optional). Overrides $CAIN_SCHEMA
  -l, --selector string                    selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
      --src string                         source to restore from. Example: s3://bucket/cassandra/namespace/cluster-name. Overrides $CAIN_SRC
  -t, --tag string                         tag to restore. Overrides $CAIN_TAG
      --user-group string                  user and group who should own restored files. Overrides $CAIN_USER_GROUP (default "cassandra:cassandra")

Examples

Restore from S3

cain restore \
    --src s3://db-backup/cassandra/default/ring01
    -n default \
    -k keyspace \
    -l release=cassandra \
    -t 20180903091624

Restore from Azure Blob Storage

cain restore \
    --src s3://my-account/db-backup-container/cassandra/default/ring01
    -n default \
    -k keyspace \
    -l release=cassandra \
    -t 20180903091624

Describe keyspace schema

Cain describes the keyspace schema using cqlsh. It can return the schema itself, or a checksum of the schema file (used by backup and restore).

Usage

$ cain schema --help
get schema of cassandra cluster

Usage:
  cain schema [flags]

Flags:
  -c, --container string   container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
  -k, --keyspace string    keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string   namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
  -l, --selector string    selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
      --sum                print only checksum. Overrides $CAIN_SUM

Examples

cain schema \
    -n default \
    -l release=cassandra \
    -k keyspace
cain schema \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --sum

Environment variables support

Cain commands support the usage of environment variables instead of flags. For example: The backup command can be executed as mentioned in the example:

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst s3://db-backup/cassandra

You can also set the appropriate envrionment variables (CAINFLAG, instead of -):

export CAIN_NAMESPACE=default
export CAIN_SELECTOR=release=cassandra
export CAIN_KEYSPACE=keyspace
export CAIN_DST=s3://db-backup/cassandra

cain backup

Support for additional storage services

Since Cain uses Skbn, adding support for additional storage services is simple. Read this post for more information.

Skbn compatibility matrix

Cain version Skbn version
0.5.3 0.5.1
0.5.2 0.4.2
0.5.1 0.4.2
0.5.0 0.4.1
0.4.2 0.4.1
0.4.1 0.4.1
0.4.0 0.4.0
0.3.0 0.3.0
0.2.0 0.2.0
0.1.0 0.1.1

Credentials

Kubernetes

Cain tries to get credentials in the following order:

  1. if KUBECONFIG environment variable is set - skbn will use the current context from that config file
  2. if ~/.kube/config exists - skbn will use the current context from that config file with an out-of-cluster client configuration
  3. if ~/.kube/config does not exist - skbn will assume it is working from inside a pod and will use an in-cluster client configuration

AWS

Skbn uses the default AWS credentials chain.

Azure Blob Storage

Skbn uses AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_ACCESS_KEY environment variables for authentication.

Cassandra Credentials

When Authentication is enabled Cain will look for default credentials
for cqlsh in /home/cassandra/.cassandra/credentials
if you use authentication please make sure the cassandra
container has this file and the username and password are correct.

For nodetool authentications default credentials are in:
/home/cassandra/.nodetool/credentials can be overridden by
setting the --nodetool-credentials-file flag.
When this flag is used, the username for the nodetool
authentication must be provided as well .

Examples

  1. Helm example
  2. Code example