Investigate Longhorn for storage.

tdudgeon commented 3 years ago

Switching to a more sophisticated storage provisioner will give us more flexibility in managing storage, in particular redundancy and easier backup/restore.

Rancher's Longhorn looks like a good fit, but others could be considered.

tdudgeon commented 3 years ago

Longhorn is now deployed to the Dev cluster. It is using the large root volumes of the l2 instances. The default number of replicas is set to 2.

tdudgeon commented 3 years ago

Defining how Longhorn volumes are to be backed up still needs to be done.

tdudgeon commented 3 years ago

I got the Longhorn backups to Echo S3 working eventually. The trick was to get the Backup Target property and the content of the Backup Target Credential Secret secret correct.

The Backup Target needs to be s3://bucket-name@us-east-1. The region has no meaning as Echo does not have regions, but a value has to be specified. Also, the bucket-name seems to need to be a top level bucket.

The Backup Target Credential Secret needs to include a AWS_ENDPOINTS property with the value of https://s3.echo.stfc.ac.uk as well as the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with the appropriate values for Echo.

With those settings I managed to do a manual backup and have now re-enabled scheduled backups. Files were put to S3.

tdudgeon commented 3 years ago

Longhorn seems to be working fine in the dev cluster. RWX volumes are also enabled.

tdudgeon commented 3 years ago

Trying to verify is the backups are cleaning up old backups from S3. Best way seems to be by monitoring the size of data on S3 and making sure it doesn't continually increase.

On 24 May:

$ rclone size echos3-xchem-1:/im-longhorn-backups-dev/
Total objects: 43018
Total size: 28.699 GBytes (30815021296 Bytes)

xchem / xchem_it

Investigate Longhorn for storage. #12