Open todeb opened 5 months ago
etcd snapshots seem like something good to have in general we've also discussed on another ticket decoupling the configuration from etcd, allowing users to choose from one of several options
As per the above comment, the scope of this implementation will be to have documented steps published. However, in case of disaster, we need to perform additional steps to identify the 'correct' states of the volumes before restoring the configuration from the etcd backup. Scoping for v4.3
i already did my custom snapshotter, maybe it will helps someone as a temp solution.
FROM alpine:latest
# Install necessary packages
RUN apk add --no-cache curl ca-certificates openssl unzip
# Install etcdctl
RUN wget -q https://github.com/etcd-io/etcd/releases/download/v3.5.0/etcd-v3.5.0-linux-amd64.tar.gz && \
tar -xzf etcd-v3.5.0-linux-amd64.tar.gz && \
cp etcd-v3.5.0-linux-amd64/etcdctl /usr/local/bin/etcdctl && \
rm -rf etcd-v3.5.0-linux-amd64.tar.gz etcd-v3.5.0-linux-amd64
# Install AWS CLI v2
RUN apk add --no-cache aws-cli
# Set up working directory
WORKDIR /app
# Copy entrypoint script
COPY entrypoint.sh /app/entrypoint.sh
# Make entrypoint script executable
RUN chmod +x entrypoint.sh
# Define entrypoint
ENTRYPOINT ["/app/entrypoint.sh"]
#!/bin/sh
# Take etcd snapshot
snapshotname=etcd-snapshot_$(date '+%Y-%m-%d_%H%M%S')
etcdctl --endpoints=${ETCD_ENDPOINTS} snapshot save $snapshotname
# Upload snapshot to MinIO
aws s3 cp $snapshotname s3://${S3_BUCKET}/ --endpoint-url ${S3_URL}
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-snapshot
namespace: openebs
spec:
schedule: "0 1 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: etcd-snapshot
image: tode/etcd-snap:0.1
imagePullPolicy: IfNotPresent
envFrom:
- secretRef:
name: etcd-s3-creds
restartPolicy: Never
apiVersion: v1
kind: Secret
metadata:
name: etcd-s3-creds
namespace: openebs
type: Opaque
data:
S3_URL: ...
AWS_ACCESS_KEY_ID: ...
AWS_SECRET_ACCESS_KEY: ...
S3_BUCKET: ...
ETCD_ENDPOINTS: ...
Is your feature request related to a problem? Please describe. in case nodes with etcd gets corrupted, or in case of disaster recovery it is required to restore etcd to operational state.
Describe the solution you'd like use k8s etcd or provide functionality for scheduled consistent snapshots from etcd.
Describe alternatives you've considered
Additional context Add any other context or screenshots about the feature request here.