Open kousu opened 3 years ago
Right now we only have two backups: one on spineimage.ca:/var, which doesn't really count as a good backup, and the one I created above, which, like the server, is on Arbutus in Victoria, and therefore one natural disaster could wipe out all the data. Moreover, there are not very many key holders -- just me at the moment -- and the data is stored inside of an OpenStack project owned by @jcohenadad, all of which makes neuropoly a single point of failure.
We should have other physical locations, to protect against natural disasters; the data sharing agreement requires us to stick to ComputeCanada as a line of defense against leaks, but since recently most of their clusters run OpenStack so we can choose a different physical location than arbutus.
We should also have other keyholders, ones who do not work for neuropoly so Praxis doesn't risk losing the data if we mess up or are attacked and get our accounts locked or wiped.
Towards all this I have been asking Praxis for help and they have found a keyholder. This person has been granted a separate ComputeCanada account and is ready to take on keyholding. They are apparently comfortable with the command line but don't have a lot of time to be involved, but they can hold the keys and, hopefully, bootstrap disaster recovery when needed.
In February, I emailed tech support, because despite seeing the list of alternate clouds, the sign up form doesn't provide a way to request one. They were extremely helpful about this:
To: "Nick Guenther" nick.guenther@polymtl.ca From: Jean-François Landry via Cloud Support cloud@tech.alliancecan.ca Date: Fri, 17 Feb 2023 22:02:57 +0000
2023-02-17 16:08 (America/Toronto) - Nick Guenther wrote:
How may we request resources on cedar.cloud.computecanada.ca or beluga.cloud.computecanada.ca? The google form at https://docs.google.com/forms/d/e/1FAIpQLSeU_BoRk5cEz3AvVLf3e9yZJq-OvcFCQ-mg7p4AWXmUkd5rTw/viewform doesn't allow choosing which cluster to use.
There is no specific option, just ask nicely in the free form description box.
Also, may we request a cloud allocation of only object storage? The form forces us to allocate at least 1 VM and one 20GB disk and 1 IP. Allocating and not using a virtual disk isn't that expensive for you, but allocating and not using an IP address is quite so and I don't want to waste one.
You can. Again, no specific "object store only" cloud RAS allocation, just fill in the minimum for VCPU/RAM etc. and please explain in the free form description box.
You can get up to 10TB of object storage through cloud RAS.
They also added
There is no geo-distributed storage system period, but the Arbutus object store works great with restic (note that restic tried to pack chunks into 16MB minimum objects by default so it will not generate hundreds of millions of tiny objects). Also please update to the latest 0.15.1 release, the new v2 repo format is considered stable and does include zstd compression by default.
So I don't expect any problems requesting storage for backups from them. It sounds like they are familiar with and use restic all the time.
I realized that for the existing backups, there is only one restic key
credential and also probably only one s3 credential to go with it at the moment, the one used by the bot:
$ echo $RESTIC_REPOSITORY
s3:object-arbutus.cloud.computecanada.ca/def-jcohen-test2
$ restic key list
repository 2d22bf7f opened (version 1)
found 2 old cache directories in /home/kousu/.cache/restic, run `restic cache --cleanup` to remove them
ID User Host Created
----------------------------------------------------
*8bd433bf gitea spineimage.ca 2022-11-30 20:57:11
----------------------------------------------------
I am going to add s3+restic key
credentials for:
I've done this by running
openstack ec2 credentials create -c access -c secret
PW=$(pwgen 100 1); echo "RESTIC_PASSWORD=$PW"
(echo $PW; echo $PW) | restic key add --user $name --host $institution.tld
for each person. I have the notes saved on /tmp and will be distributing them as securely as I can.
On Wednesday the 24th we are going to have a meeting with Praxis's nominee where we:
Have them install restic
brew install restic
apt install restic
Provide them restic
credentials to the existing backups
Test by having them do restic snapshots
and restic ls latest
Mention that restic disaster recovery docs are at https://restic.readthedocs.io/en/stable/050_restore.html
Mention that the creds include s3 creds so they can be used with s3cmd
or aws-cli
Walk them through requesting a cloud project of their own.
It should be on Graham, geographically separate from existing server/backups, and it doesn't need an IP address wasted on it. Here's the application form filled out with copy-pasteable answers:
EDIT: we were misinformed.
2023-05-24 16:01 Lucas Whittington via Cloud Support wrote:
Unfortunately, Arbutus is the only Alliance cloud that provides object storage. Is is stored on separate machines from our volume cluster but won't protect you in the event of an incident that affects our entire data centre. Let me know if you would like to proceed.
Instead, we will build our own file server on the other cluster. I'll either use minio or just sftp. Here's the updated request:
Leave them with instructions on how to generate and send us countervailing s3 creds
We need s3 credentials generated for:
Please forward the credentials privately to each individual keyholder. We will discuss at the meeting what the safest way to do that is.
I will initialize the new repository and then hand out RESTIC_PASSWORDs to all the keyholders using pwgen 100 1
.
I just came up against this after rebooting:
Nov 07 07:39:30 spineimage.ca systemd[1]: Finished systemd-networkd-wait-online.service - Wait for Network to be Configured.
Nov 07 07:39:31 spineimage.ca systemd[1]: dev-disk-by\x2duuid-2067a784\x2d07ef\x2d4317\x2d88d0\x2d4591442577d1.device: Job dev-disk-by\x2duuid-2067a784\x2d>
Nov 07 07:39:31 spineimage.ca systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-2067a784\x2d07ef\x2d4317\x2d88d0\x2d4591442577d1.device - /dev/d>
Nov 07 07:39:31 spineimage.ca systemd[1]: Dependency failed for systemd-fsck@dev-disk-by\x2duuid-2067a784\x2d07ef\x2d4317\x2d88d0\x2d4591442577d1.service ->
Nov 07 07:39:31 spineimage.ca systemd[1]: Dependency failed for srv-gitea.mount - /srv/gitea.
Nov 07 07:39:31 spineimage.ca systemd[1]: Dependency failed for gitea.service - Gitea (Git with a cup of tea).
Nov 07 07:39:31 spineimage.ca systemd[1]: gitea.service: Job gitea.service/start failed with result 'dependency'.
Nov 07 07:39:31 spineimage.ca systemd[1]: srv-gitea.mount: Job srv-gitea.mount/start failed with result 'dependency'.
i.e. /srv/gitea wasn't mounted, so gitea wasn't running. Can we make this more reliable somehow?
After a second reboot, it came up fine. So I don't know, maybe it was a fluke.
https://praxisinstitute.org wants to fund a Canada-wide spine scan sharing platform.
They were considering paying OBI as a vendor, and having them set up a neuroimaging repository. But had doubts about the quality of that solution and have looked around for others, and have landed on asking us for help.
We've proposed a federated data sharing plan and they are interested in pursuing this line.
Needs