[DOC] - Consider adding a k8s job in backup/restore docs

Adam-D-Lewis commented 1 month ago

Preliminary Checks

[X] This issue is not a question, feature request, RFC, or anything other than a bug report. Please post those things in GitHub Discussions: https://github.com/nebari-dev/nebari/discussions

Summary

Consider adding k8s job to file system backup. Using a k8s job is useful over a simple pod when the file system is large and copying all the data takes a long time. If you try and tar everything up from jupyterlab then your server can timeout due to inactivity before copying everything into a tarball. A k8s job gets around this e.g. Something like

kind: Job
apiVersion: batch/v1
metadata:
  name: backup
  namespace: dev
spec:
  template:
    spec:
      volumes:
        - name: backup-volume
          persistentVolumeClaim:
            claimName: "jupyterhub-dev-share"
      containers:
        - name: debugger
          image: ubuntu
          command: ["/bin/bash", "-c", "cd /data && tar -cvpzf 2024-03-08-shared.tar.gz shared && echo 'Backup complete' > backup.txt"]
          volumeMounts:
            - mountPath: "/data"
              name: backup-volume
      restartPolicy: OnFailure

and for restore

kind: Job
apiVersion: batch/v1
metadata:
  name: restore
  namespace: dev
spec:
  template:
    spec:
      volumes:
        - name: backup-volume
          persistentVolumeClaim:
            claimName: "jupyterhub-dev-share"
      containers:
        - name: debugger
          image: ubuntu
          command: ["/bin/bash", "-c", "cd /data && tar -xvpzf 2024-03-08-shared.tar.gz --skip-old-files && echo 'Restore complete' > restore2.txt"]
          volumeMounts:
            - mountPath: "/data"
              name: backup-volume
      restartPolicy: OnFailure

Steps to Resolve this Issue

-

marcelovilla commented 1 month ago

I think this would be useful for users until the back up and restore mechanism (see https://github.com/nebari-dev/governance/issues/49) is in place.

We can add some further logic to the job definitions in order to install the AWS CLI and upload/download the tarball from a given S3 bucket.

Adam-D-Lewis commented 2 weeks ago

I also have ssh'd into the nfs pod after creating the tarball, moved it to my user home directory, then downloaded it via the Jupyterhub UI so that's an option as well rather than uploading to object storage.

nebari-dev / nebari-docs