openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
740 stars 106 forks source link

etcd backup, restore / s3 #1645

Open todeb opened 5 months ago

todeb commented 5 months ago

Is your feature request related to a problem? Please describe. in case nodes with etcd gets corrupted, or in case of disaster recovery it is required to restore etcd to operational state.

Describe the solution you'd like use k8s etcd or provide functionality for scheduled consistent snapshots from etcd.

Describe alternatives you've considered

  1. create cron job with custom etcdctl and aws image, and add custom command to make etcd snapsohts and transfer them to s3.

Additional context Add any other context or screenshots about the feature request here.

tiagolobocastro commented 4 months ago

etcd snapshots seem like something good to have in general we've also discussed on another ticket decoupling the configuration from etcd, allowing users to choose from one of several options

avishnu commented 5 days ago

As per the above comment, the scope of this implementation will be to have documented steps published. However, in case of disaster, we need to perform additional steps to identify the 'correct' states of the volumes before restoring the configuration from the etcd backup. Scoping for v4.3

todeb commented 4 days ago

i already did my custom snapshotter, maybe it will helps someone as a temp solution.

dockerfile

FROM alpine:latest

# Install necessary packages
RUN apk add --no-cache curl ca-certificates openssl unzip

# Install etcdctl
RUN wget -q https://github.com/etcd-io/etcd/releases/download/v3.5.0/etcd-v3.5.0-linux-amd64.tar.gz && \
    tar -xzf etcd-v3.5.0-linux-amd64.tar.gz && \
    cp etcd-v3.5.0-linux-amd64/etcdctl /usr/local/bin/etcdctl && \
    rm -rf etcd-v3.5.0-linux-amd64.tar.gz etcd-v3.5.0-linux-amd64

# Install AWS CLI v2
RUN apk add --no-cache aws-cli

# Set up working directory
WORKDIR /app

# Copy entrypoint script
COPY entrypoint.sh /app/entrypoint.sh

# Make entrypoint script executable
RUN chmod +x entrypoint.sh

# Define entrypoint
ENTRYPOINT ["/app/entrypoint.sh"]

entrypoint.sh

#!/bin/sh

# Take etcd snapshot
snapshotname=etcd-snapshot_$(date '+%Y-%m-%d_%H%M%S')
etcdctl --endpoints=${ETCD_ENDPOINTS} snapshot save $snapshotname

# Upload snapshot to MinIO
aws s3 cp $snapshotname s3://${S3_BUCKET}/ --endpoint-url ${S3_URL}

cronjob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-snapshot
  namespace: openebs
spec:
  schedule: "0 1 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: etcd-snapshot
            image: tode/etcd-snap:0.1
            imagePullPolicy: IfNotPresent
            envFrom:
            - secretRef:
                name: etcd-s3-creds
          restartPolicy: Never

etcd-s3-creds

apiVersion: v1
kind: Secret
metadata:
  name: etcd-s3-creds
  namespace: openebs
type: Opaque
data:
  S3_URL: ...
  AWS_ACCESS_KEY_ID: ...
  AWS_SECRET_ACCESS_KEY: ...
  S3_BUCKET: ...
  ETCD_ENDPOINTS: ...