Closed kousu closed 3 years ago
According to archwiki, we can avoid this in the future by appending
noauto,x-systemd.automount
to the /srv/git/repositories
line in /etc/fstab
. This will enable the server to boot even if the storage disk messes up.
An older alternative to the same thing is autofs
. I am skeptical about systemd in general, but in this case I think it is the simpler solution.
(EDIT: this is just a systemd wrapper around autofs; still, it is the simpler option)
The server broke due to some kind of nasty driver bug with the storage disk: https://github.com/neuropoly/datalad/issues/21
We now have a new 1TB disk, currently at /dev/sdc. I've been promised that is an expandable virtual disk, so that size is just a quota and not a hard upper limit, so I am going to replace the old storage disk with this one. When it runs out of space, we'll have to ask Jean-Sébastien to resize it, and then run e2resize
to get access to the extra space (that is, until we hit linux filesystem limits, but that's for a later date).
root@data:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 97.7M 1 loop /snap/core/10185
loop1 7:1 0 97.9M 1 loop /snap/core/10444
loop2 7:2 0 55.3M 1 loop /snap/core18/1885
loop3 7:3 0 55.4M 1 loop /snap/core18/1932
sda 8:0 0 127G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
└─sda2 8:2 0 126.5G 0 part /
sdb 8:16 0 1T 0 disk
└─sdb1 8:17 0 1024G 0 part
sdc 8:32 0 1T 0 disk
sr0 11:0 1 1024M 0 rom
root@data:~# fdisk /dev/sdc
Welcome to fdisk (util-linux 2.36).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xb5c77a8b.
Command (m for help): p
Disk /dev/sdc: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Disk model: Virtual Disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xb5c77a8b
Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p):
Using default response p.
Partition number (1-4, default 1):
First sector (2048-2147483647, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-2147483647, default 2147483647):
Created a new partition 1 of type 'Linux' and of size 1024 GiB.
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@data:~# mkfs.ext4 -L "neuropoly-data" /dev/sdc1
mke2fs 1.45.6 (20-Mar-2020)
Discarding device blocks: done
Creating filesystem with 268435200 4k blocks and 67108864 inodes
Filesystem UUID: d6d8c87e-fe67-4739-b44d-98f88f243364
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
Open up /etc/fstab
to reenable the mount, swapping the new filesystem in; additionally, include the preventative measure mentioned above:
root@data:~# vi /etc/fstab
root@data:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/sda2 during installation
UUID=49cd31d6-7a4f-476a-80ba-2631bbb6a12a / ext4 errors=remount-ro 0 1
# /boot/efi was on /dev/sda1 during installation
UUID=217B-52E8 /boot/efi vfat umask=0077 0 1
/swapfile none swap sw 0 0
# datasets
UUID=d6d8c87e-fe67-4739-b44d-98f88f243364 /srv/git/repositories ext4 errors=remount-ro,noauto,x-systemd.automount 0 1
Fix up the top-level filesystem permissions:
root@data:~# mount /srv/git/repositories
root@data:~# chown -R git:git /srv/git/repositories
Deal with the "lost+found" glitch in the same bad bad bad incomplete way I did before:
root@data:~# rmdir /srv/git/repositories/lost+found/
Reboot at this point to make sure it takes:
root@data:~# reboot
When logged back in, at first /srv/git/repositories is not mounted, but as soon as something touches it it shows up:
git@data:~$ mount
[...]
/dev/sda2 on / type ext4 (rw,relatime,errors=remount-ro)
[...]
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=809468k,nr_inodes=202367,mode=700,uid=1001,gid=1001)
git@data:~$ ls -la repositories
total 8
drwxr-xr-x 2 git git 4096 Dec 8 07:12 .
drwxr-xr-x 11 git git 4096 Dec 8 04:25 ..
git@data:~$ mount
[...]
/dev/sda2 on / type ext4 (rw,relatime,errors=remount-ro)
[...]
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=809468k,nr_inodes=202367,mode=700,uid=1001,gid=1001)
/dev/sdc1 on /srv/git/repositories type ext4 (rw,relatime,errors=remount-ro,x-systemd.automount)
The main problem is I hadn't set up backups yet (#20). We hadn't put much data in yet, so the thought didn't cross my mind. Egg on my face. I am embarrassed.
Surveying what I have:
gitolite-admin.git
nguenther@data:~/datasets/large$ git remote -v
origin git@data.neuro.polymtl.ca:datasets/sct-testing-large.git (fetch)
origin git@data.neuro.polymtl.ca:datasets/sct-testing-large.git (push)
So here's how to get it back together:
gitolite-admin
from the most recent copy I have.~git/.gitolite/keydir/konstantinos.pub
and paste it back into ssh git@... keys add
~git/.gitolite/keydir/konstantinos@acheron.pub
and paste it back into ssh git@... keys add
(cd ~nguenther/datasets/large && git push origin && git annex copy --to origin)
(cd ~/src/neuropoly/datalad/data-single-subject && git push internal && git annex copy --to internal)
First, save the missing pubkeys for later in case I wipe them out. If I do, I can recover them by asking Konstantinos for them, but hopefully I don't have to bother him:
nguenther@data:~/gitolite-admin$ sudo -i -u git bash
git@data:~$ cat .gitolite/keydir/konstantinos@acheron.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDUyn5vweHYvJnQcwu79yiRHwS+ZY2HcD1HShP4xJ1gKHMXlhlVbZ6gx2yYyRV6eOdKOIplyNPw5zOjd8pXYsjMLtZru2brLDNoynzwFqJY8VqfZRVhHKQnnU056dtT16Qp2u+DfeOvJhANYiSlnrMV0W+/nup4PoiWarseOPNySdeBo80k/oWJLp8kn9kXTemIa3ZOtNLWFWN4kxyVIA5F5l7rIzmpaBRjx8TuibP9afQKFLDw3vfNBEFzc0/oCYE6GWApvoxwfnP4AIHjL5WZ8TDy9I5RrlNCxhBxRVau4WXhOvAj58IiB/9I1Hi2g178qc9dTBYx0GM1Cbg7RWQsEdua6qabdE2L2wG3oPoQmfcQqtrRsFW5nfOOZ3U8hSk9YX/hlpa2y68EyC/+x2Yt9irDG6mGgfyIY3T8dhGerMgZ9BOOpVwuzVZiLrpZnPJc8kljdaiwS4Olqo9jh5FO/k7U9is56ODFyTGuQqW1H8O2BkIJtAva5E6xOTtWmwE=
git@data:~$ cat .gitolite/keydir/konstantinos.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCi8Fdy9RMf+pwQLW6h5dGRMKbnsRc2JVTC5upLdnms7cUQjJE/6sBoHbgQF9BusgYdvug8qR/HacJeEnITpndT1o2ddGVXWZdxYrBC3yRUHECUwL0oib3WgfKkYv3XfeUZgLHZTTIyUNNB44JXiVpJljBE0OImbt2tg8Lp3JNoRXlYR4iH6973BMyA0hi8aG1ubHxPxL+13NZXP81CZLI6w9s5KiQENKkF4AcmoSm8A5HDoi9Ea2YcqxwIn7jz1VryROFNoRNBT5+7ldw3GAHXl4uGoji9rfUlXSHKLsxA4ZG3lum6jVgMz9Wpe0uIYLbpa0g8V0Yzr9TkjVWR9IQZ u111358@rosenberg
Now, reinit the core repo. I think the spare copy I had on the server is pretty good, it's from November 10th, the outage was from November 14th, so not very much could have happened in between:
nguenther@data:~$ cd gitolite-admin/
nguenther@data:~/gitolite-admin$ pwd
/home/nguenther/gitolite-admin
nguenther@data:~/gitolite-admin$ git log -n 1 master
commit 3b3ad80bf8c20e424b8d345c46c5a519401cc79b (HEAD -> master, origin/master, origin/HEAD)
Author: git on data.neuro.polymtl.ca <git@data.neuro.polymtl.ca>
Date: Tue Nov 10 14:39:40 2020 -0500
keys: add nguenther@server (SHA256:YjIcdy0fnALMCfT8YEx7x6eexXyugvvfuHKRCRT48vA)
nguenther@data:~/gitolite-admin$ git push --all origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
FATAL: W any gitolite-admin nguenther DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
hm. I guess it needs at least an empty repo on the other side. Oh yes, that makes sense; gitolite-admin is not a wildrepo, so it won't be created on a push.
Two steps then: a) Recover from the earlier backup, b) push the more recent backup.
nguenther@data:~/gitolite-admin$ sudo -i -u git bash
git@data:~$ rsync -av repositories.bak/gitolite-admin.git repositories/
sending incremental file list
gitolite-admin.git/
gitolite-admin.git/COMMIT_EDITMSG
gitolite-admin.git/HEAD
gitolite-admin.git/config
gitolite-admin.git/description
gitolite-admin.git/gl-conf
gitolite-admin.git/index
gitolite-admin.git/branches/
gitolite-admin.git/hooks/
gitolite-admin.git/hooks/applypatch-msg.sample
gitolite-admin.git/hooks/commit-msg.sample
gitolite-admin.git/hooks/fsmonitor-watchman.sample
gitolite-admin.git/hooks/post-update -> /srv/git/.gitolite/hooks/gitolite-admin/post-update
gitolite-admin.git/hooks/post-update.sample
gitolite-admin.git/hooks/pre-applypatch.sample
gitolite-admin.git/hooks/pre-commit.sample
gitolite-admin.git/hooks/pre-merge-commit.sample
gitolite-admin.git/hooks/pre-push.sample
gitolite-admin.git/hooks/pre-rebase.sample
gitolite-admin.git/hooks/pre-receive.sample
gitolite-admin.git/hooks/prepare-commit-msg.sample
gitolite-admin.git/hooks/update -> /srv/git/.gitolite/hooks/common/update
gitolite-admin.git/hooks/update.sample
gitolite-admin.git/info/
gitolite-admin.git/info/exclude
gitolite-admin.git/logs/
gitolite-admin.git/logs/HEAD
gitolite-admin.git/logs/refs/
gitolite-admin.git/logs/refs/heads/
gitolite-admin.git/logs/refs/heads/master
gitolite-admin.git/objects/
gitolite-admin.git/objects/03/
gitolite-admin.git/objects/03/66efe7cb67b68f1830d99a67ae65166d6c3471
gitolite-admin.git/objects/06/
gitolite-admin.git/objects/06/7bb96fd8d465342925fae7de0fd2856a8440f8
gitolite-admin.git/objects/0a/
gitolite-admin.git/objects/0a/d7680237d632cb2dbca04964f5d19ceba72089
gitolite-admin.git/objects/1c/
gitolite-admin.git/objects/1c/bd348fa9ea330f1ab3f6aaec0019d3bafe05a5
gitolite-admin.git/objects/1d/
gitolite-admin.git/objects/1d/58f0b8835b052819efc21435c1eee0afbdbeb6
gitolite-admin.git/objects/3b/
gitolite-admin.git/objects/3b/f64657d682cfa84a1dfbd8847810936281de60
gitolite-admin.git/objects/3e/
gitolite-admin.git/objects/3e/93a6d538cdaef6a30a8a50f2074eed75c76679
gitolite-admin.git/objects/45/
gitolite-admin.git/objects/45/7f2d687c0fcf2091cbd231bed079b3236490d1
gitolite-admin.git/objects/46/
gitolite-admin.git/objects/46/5c1a35ecfb7b2ea2ddc6d55cb3df3ca772c8c6
gitolite-admin.git/objects/55/
gitolite-admin.git/objects/55/36d7abf29963dc27b82ad2c2c9de6aa7658d68
gitolite-admin.git/objects/5d/
gitolite-admin.git/objects/5d/ef6d56a44cb84f0f058d1ff5af8fc3022bc82d
gitolite-admin.git/objects/62/
gitolite-admin.git/objects/62/6538ba785086bd04aa612db2b6135cd3cd8d1a
gitolite-admin.git/objects/67/
gitolite-admin.git/objects/67/e1ce6ec686f106d660f174a3ed8f34bbaa731f
gitolite-admin.git/objects/6b/
gitolite-admin.git/objects/6b/3d7a4d6396b2b97d90526d7f2d70c991feeecc
gitolite-admin.git/objects/6b/910bb5a57aa5558a241850ecdd091bce0f6dcd
gitolite-admin.git/objects/6c/
gitolite-admin.git/objects/6c/0fff4de3e1a4ed04ae777f442c99ecee292420
gitolite-admin.git/objects/6c/e13516a8668d9df3e9dd93518dd3d59a0396a7
gitolite-admin.git/objects/70/
gitolite-admin.git/objects/70/7cbfcdb11ad51df0ea7b06a7b16336326b8f16
gitolite-admin.git/objects/71/
gitolite-admin.git/objects/71/a148807b0c0c1e5ef07a344c3783b5c9fbeb95
gitolite-admin.git/objects/79/
gitolite-admin.git/objects/79/d00243ce64b347092bdf7c19d3bc2be2d89fc5
gitolite-admin.git/objects/7f/
gitolite-admin.git/objects/7f/6f3ef81ff6f5e216fe04b794393e26a4d8e482
gitolite-admin.git/objects/90/
gitolite-admin.git/objects/90/323dfc76c2ec44592372a8fd010214d8cd4285
gitolite-admin.git/objects/96/
gitolite-admin.git/objects/96/447717c453e4269f226ac24dd49b4c76099393
gitolite-admin.git/objects/9d/
gitolite-admin.git/objects/9d/5bdfa9dee355bc475022fa5f3e5555a4c20747
gitolite-admin.git/objects/9e/
gitolite-admin.git/objects/9e/64c62987ebfcb0ccd03bfa675d445bf0d912a2
gitolite-admin.git/objects/a8/
gitolite-admin.git/objects/a8/f2963509fd9cc3f808dea9be3e91a9b6d76a75
gitolite-admin.git/objects/ac/
gitolite-admin.git/objects/ac/cb71aaff29e46ca82a2b723ec113c78930a093
gitolite-admin.git/objects/af/
gitolite-admin.git/objects/af/b338306a84dc348494f012327183bbd048a050
gitolite-admin.git/objects/b2/
gitolite-admin.git/objects/b2/7e2bbc1e419146c32261cadb90ba00c59fd6d6
gitolite-admin.git/objects/c1/
gitolite-admin.git/objects/c1/973eb6f81cf617d41171af7848d400a6999fdd
gitolite-admin.git/objects/d5/
gitolite-admin.git/objects/d5/1af9c42b41f45722a518037fd13457ea64ed9e
gitolite-admin.git/objects/d6/
gitolite-admin.git/objects/d6/17788e10970e41e53ae258e9080a2e4d43a779
gitolite-admin.git/objects/dc/
gitolite-admin.git/objects/dc/cfb341c43fd7e00c2eb17722c45fff338de09c
gitolite-admin.git/objects/e9/
gitolite-admin.git/objects/e9/c605c361a863bc2ec65aa583ed45c167f16758
gitolite-admin.git/objects/ea/
gitolite-admin.git/objects/ea/9d22a721be3d6298bdac9219a96ceaedd905a5
gitolite-admin.git/objects/fc/
gitolite-admin.git/objects/fc/640bb3e66e57bf2e72545623bbe6c4f9ba0285
gitolite-admin.git/objects/info/
gitolite-admin.git/objects/pack/
gitolite-admin.git/refs/
gitolite-admin.git/refs/heads/
gitolite-admin.git/refs/heads/master
gitolite-admin.git/refs/tags/
sent 38,288 bytes received 1,320 bytes 79,216.00 bytes/sec
total size is 31,802 speedup is 0.80
git@data:~$ exit
nguenther@data:~/gitolite-admin$ git push --all origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 533 bytes | 106.00 KiB/s, done.
Total 4 (delta 1), reused 3 (delta 0), pack-reused 0
To data.neuro.polymtl.ca:gitolite-admin
9d5bdfa..3b3ad80 master -> master
Test:
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca info
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
hello nguenther, this is git@data running gitolite3 3.6.11-2 (Debian) on git 2.27.0
R W C CREATOR/..*
R W C datasets/..*
R W gitolite-admin
Re-add Konstantinos:
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys add konstantinos
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDUyn5vweHYvJnQcwu79yiRHwS+ZY2HcD1HShP4xJ1gKHMXlhlVbZ6gx2yYyRV6eOdKOIplyNPw5zOjd8pXYsjMLtZru2brLDNoynzwFqJY8VqfZRVhHKQnnU056dtT16Qp2u+DfeOvJhANYiSlnrMV0W+/nup4PoiWarseOPNySdeBo80k/oWJLp8kn9kXTemIa3ZOtNLWFWN4kxyVIA5F5l7rIzmpaBRjx8TuibP9afQKFLDw3vfNBEFzc0/oCYE6GWApvoxwfnP4AIHjL5WZ8TDy9I5RrlNCxhBxRVau4WXhOvAj58IiB/9I1Hi2g178qc9dTBYx0GM1Cbg7RWQsEdua6qabdE2L2wG3oPoQmfcQqtrRsFW5nfOOZ3U8hSk9YX/hlpa2y68EyC/+x2Yt9irDG6mGgfyIY3T8dhGerMgZ9BOOpVwuzVZiLrpZnPJc8kljdaiwS4Olqo9jh5FO/k7U9is56ODFyTGuQqW1H8O2BkIJtAva5E6xOTtWmwE=
^D
Added SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys add konstantinos@rosenberg
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCi8Fdy9RMf+pwQLW6h5dGRMKbnsRc2JVTC5upLdnms7cUQjJE/6sBoHbgQF9BusgYdvug8qR/HacJeEnITpndT1o2ddGVXWZdxYrBC3yRUHECUwL0oib3WgfKkYv3XfeUZgLHZTTIyUNNB44JXiVpJljBE0OImbt2tg8Lp3JNoRXlYR4iH6973BMyA0hi8aG1ubHxPxL+13NZXP81CZLI6w9s5KiQENKkF4AcmoSm8A5HDoi9Ea2YcqxwIn7jz1VryROFNoRNBT5+7ldw3GAHXl4uGoji9rfUlXSHKLsxA4ZG3lum6jVgMz9Wpe0uIYLbpa0g8V0Yzr9TkjVWR9IQZ u111358@rosenberg
^D
Added SHA256:Vf49YizTm3zjDtLM3bQ7haK0zsoausiJ8xG+9/LPTPE : konstantinos@rosenberg.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys list
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
Hello nguenther, you are an admin.
These are all registered keys:
============================
1: SHA256:BZcsg/BfyQ27pIOSFw94ZiBmTKGHJ7Qy/Vqww/x5ujQ : alfoi.pub
2: SHA256:AZp8tEp8yJKivYB91wPWqRyVIQm3SzlJYk7PlPv26o8 : andreannelemay.pub
3: SHA256:Ss3ePRjzwzjZAUYmqItooySyJdtd2UvlqbDZ5UaIAHo : jcohen.pub
4: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos@acheron.pub
5: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos.pub
6: SHA256:Vf49YizTm3zjDtLM3bQ7haK0zsoausiJ8xG+9/LPTPE : konstantinos@rosenberg.pub
7: SHA256:EBfMaqmOuoXeNU7BGuDm2S07tgZgdkuEBMAQlmV3fAI : nguenther.pub
8: SHA256:6w9uivbXYfjnDEz3NukOB3L9IZFdHj8qZn0BXiSTl4o : nguenther@requiem.pub
9: SHA256:YjIcdy0fnALMCfT8YEx7x6eexXyugvvfuHKRCRT48vA : nguenther@server.pub
Hm, a weird inconsistency: I accidentally renamed konstantinos@archeron.pub -> konstantinos.pub konstantinos@rosenberg.pub and konstantinos.pub -> konstantinos@rosenberg.pub, and now gitolite
shows all three names. I'm surprised? I would think
nguenther@data:~/gitolite-admin$ git pull --rebase
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (12/12), done.
Unpacking objects: 100% (12/12), 1.91 KiB | 1.92 MiB/s, done.
remote: Total 12 (delta 4), reused 0 (delta 0), pack-reused 0
From data.neuro.polymtl.ca:gitolite-admin
3b3ad80..eac44aa master -> origin/master
Updating 3b3ad80..eac44aa
Fast-forward
keydir/konstantinos.pub | 2 ++
keydir/konstantinos@rosenberg.pub | 1 +
2 files changed, 3 insertions(+)
create mode 100644 keydir/konstantinos.pub
create mode 100644 keydir/konstantinos@rosenberg.pub
nguenther@data:~/gitolite-admin$ ls keydir/
alfoi.pub andreannelemay.pub jcohen.pub konstantinos.pub konstantinos@rosenberg.pub nguenther.pub nguenther@requiem.pub nguenther@server.pub
nguenther@data:~/gitolite-admin$
They're not in the admin repo. The wrong ones must be literally...on the disk. Ugh. That's the first big strike I've had against gitolite.
Patch this over this very silly way:
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys list
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
Hello nguenther, you are an admin.
These are all registered keys:
============================
1: SHA256:BZcsg/BfyQ27pIOSFw94ZiBmTKGHJ7Qy/Vqww/x5ujQ : alfoi.pub
2: SHA256:AZp8tEp8yJKivYB91wPWqRyVIQm3SzlJYk7PlPv26o8 : andreannelemay.pub
3: SHA256:Ss3ePRjzwzjZAUYmqItooySyJdtd2UvlqbDZ5UaIAHo : jcohen.pub
4: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos@acheron.pub
5: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos.pub
6: SHA256:Vf49YizTm3zjDtLM3bQ7haK0zsoausiJ8xG+9/LPTPE : konstantinos@rosenberg.pub
7: SHA256:EBfMaqmOuoXeNU7BGuDm2S07tgZgdkuEBMAQlmV3fAI : nguenther.pub
8: SHA256:6w9uivbXYfjnDEz3NukOB3L9IZFdHj8qZn0BXiSTl4o : nguenther@requiem.pub
9: SHA256:YjIcdy0fnALMCfT8YEx7x6eexXyugvvfuHKRCRT48vA : nguenther@server.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys add konstantinos@acheron
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDUyn5vweHYvJnQcwu79yiRHwS+ZY2HcD1HShP4xJ1gKHMXlhlVbZ6gx2yYyRV6eOdKOIplyNPw5zOjd8pXYsjMLtZru2brLDNoynzwFqJY8VqfZRVhHKQnnU056dtT16Qp2u+DfeOvJhANYiSlnrMV0W+/nup4PoiWarseOPNySdeBo80k/oWJLp8kn9kXTemIa3ZOtNLWFWN4kxyVIA5F5l7rIzmpaBRjx8TuibP9afQKFLDw3vfNBEFzc0/oCYE6GWApvoxwfnP4AIHjL5WZ8TDy9I5RrlNCxhBxRVau4WXhOvAj58IiB/9I1Hi2g178qc9dTBYx0GM1Cbg7RWQsEdua6qabdE2L2wG3oPoQmfcQqtrRsFW5nfOOZ3U8hSk9YX/hlpa2y68EyC/+x2Yt9irDG6mGgfyIY3T8dhGerMgZ9BOOpVwuzVZiLrpZnPJc8kljdaiwS4Olqo9jh5FO/k7U9is56ODFyTGuQqW1H8O2BkIJtAva5E6xOTtWmwE=
Added SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos@acheron.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys del konstantinos
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
Removed SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys list
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
Hello nguenther, you are an admin.
These are all registered keys:
============================
1: SHA256:BZcsg/BfyQ27pIOSFw94ZiBmTKGHJ7Qy/Vqww/x5ujQ : alfoi.pub
2: SHA256:AZp8tEp8yJKivYB91wPWqRyVIQm3SzlJYk7PlPv26o8 : andreannelemay.pub
3: SHA256:Ss3ePRjzwzjZAUYmqItooySyJdtd2UvlqbDZ5UaIAHo : jcohen.pub
4: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos@acheron.pub
5: SHA256:Vf49YizTm3zjDtLM3bQ7haK0zsoausiJ8xG+9/LPTPE : konstantinos@rosenberg.pub
6: SHA256:EBfMaqmOuoXeNU7BGuDm2S07tgZgdkuEBMAQlmV3fAI : nguenther.pub
7: SHA256:6w9uivbXYfjnDEz3NukOB3L9IZFdHj8qZn0BXiSTl4o : nguenther@requiem.pub
8: SHA256:YjIcdy0fnALMCfT8YEx7x6eexXyugvvfuHKRCRT48vA : nguenther@server.pub
nguenther@data:~$ mv datasets/large/ datasets/sct-testing-large # catch up with the repo rename we did
nguenther@data:~$ cd datasets/sct-testing-large/
nguenther@data:~/datasets/sct-testing-large$ git remote -v # the remotes are already set correctly, though
origin git@data.neuro.polymtl.ca:datasets/sct-testing-large.git (fetch)
origin git@data.neuro.polymtl.ca:datasets/sct-testing-large.git (push)
Upload the git
part:
nguenther@data:~/datasets/sct-testing-large$ git push --all origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
Initialized empty Git repository in /srv/git/repositories/datasets/sct-testing-large.git/
Enumerating objects: 84314, done.
Counting objects: 100% (84314/84314), done.
Compressing objects: 100% (62957/62957), done.
Writing objects: 100% (84314/84314), 7.30 MiB | 19.17 MiB/s, done.
Total 84314 (delta 25819), reused 74365 (delta 15870), pack-reused 0
remote: Resolving deltas: 100% (25819/25819), done.
To data.neuro.polymtl.ca:datasets/sct-testing-large.git
* [new branch] git-annex -> git-annex
* [new branch] master -> master
* [new branch] synced/master -> synced/master
great!
But, new problem when trying to upload the annex
part:
nguenther@data:~/datasets/sct-testing-large$ git annex copy --to origin
[...]
copy derivatives/labels/sub-amuAMU15005/anat/sub-amuAMU15005_T2star_gmseg-manual.nii.gz (checking origin...) git-annex-shell: expected repository UUID de2707ce-a9b6-4815-9f3d-edff5c166624 but found uninitialized repository
(to origin...)
git-annex-shell: expected repository UUID de2707ce-a9b6-4815-9f3d-edff5c166624 but found uninitialized repository
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(228) [sender=3.2.3]
rsync exited 12
rsync failed -- run git annex again to resume file transfer
failed
copy derivatives/labels/sub-amuAMU15005/anat/sub-amuAMU15005_T2star_seg-manual.nii.gz (checking origin...) git-annex-shell: expected repository UUID de2707ce-a9b6-4815-9f3d-edff5c166624 but found uninitialized repository
[...]
git-annex
does this location tracking which ..would be more helpful if it wasn't so tightly integrated, I think.
It is expecting to find a repo that isn't there anymore and it's balking.
But I uploaded to an empty repo before, so what's the difference?
I looked around and thought and basically just lucked into realizing it was probably in .git/config
, and sure enough:
nguenther@data:~/datasets/sct-testing-large$ git config --unset remote.origin.annex-uuid
nguenther@data:~/datasets/sct-testing-large$ time git annex copy --to origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
copy derivatives/labels/sub-amuAMU15001/anat/sub-amuAMU15001_T2star_gmseg-manual.nii.gz (to origin...)
ok
copy derivatives/labels/sub-amuAMU15001/anat/sub-amuAMU15001_T2star_seg-manual.nii.gz (to origin...)
ok
copy derivatives/labels/sub-amuAMU15002/anat/sub-amuAMU15002_T2star_gmseg-manual.nii.gz (to origin...)
ok
copy derivatives/labels/sub-amuAMU15002/anat/sub-amuAMU15002_T2star_seg-manual.nii.gz (to origin...)
ok
copy derivatives/labels/sub-amuAMU15003/anat/sub-amuAMU15003_T2star_gmseg-manual.nii.gz (to origin...)
[...]
copy sub-zurichMPM05_ses-02/anat/sub-zurichMPM05_ses-02_echo-3_T1w.nii.gz (to origin...)
ok
copy sub-zurichMPM05_ses-02/anat/sub-zurichMPM05_ses-02_echo-4_T1w.nii.gz (to origin...)
ok
copy sub-zurichMPM05_ses-02/anat/sub-zurichMPM05_ses-02_echo-5_T1w.nii.gz (to origin...)
ok
copy sub-zurichMPM05_ses-02/anat/sub-zurichMPM05_ses-02_echo-6_T1w.nii.gz (to origin...)
ok
(recording state in git...)
real 11m23.040s
user 0m37.712s
sys 0m37.427s
And now that I've done this I should make sure to dead
the missing repo; this UUID is what git config remote.origin.annex-uuid
was before:
nguenther@data:~/datasets/sct-testing-large$ git annex dead de2707ce-a9b6-4815-9f3d-edff5c166624
dead de2707ce-a9b6-4815-9f3d-edff5c166624 ok
(recording state in git...)
nguenther@data:~/datasets/sct-testing-large$ git annex sync --content origin
commit
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
ok
pull origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519':
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
ok
push origin
Enumerating objects: 32584, done.
Counting objects: 100% (32584/32584), done.
Compressing objects: 100% (21197/21197), done.
Writing objects: 100% (22643/22643), 2.05 MiB | 11.75 MiB/s, done.
Total 22643 (delta 9546), reused 13613 (delta 516), pack-reused 0
remote: Resolving deltas: 100% (9546/9546), completed with 9030 local objects.
To data.neuro.polymtl.ca:datasets/sct-testing-large.git
* [new branch] git-annex -> synced/git-annex
ok
Spot check that git-annex thinks there's now only one copy of each thing:
nguenther@data:~/datasets/sct-testing-large$ git annex whereis derivatives/labels/sub-bwh028/
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-ax_T2w_lesion-manual.nii.gz (1 copy)
6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-ax_T2w_seg-manual.nii.gz (1 copy)
6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-sag_T2w_labels-disc-manual.nii.gz (1 copy)
6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-sagstir_T2w_labels-disc-manual.nii.gz (1 copy)
6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-sagstir_T2w_lesion-manual.nii.gz (1 copy)
6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-sagstir_T2w_seg-manual.nii.gz (1 copy)
6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
Here's the upload for the other dataset:
[kousu@requiem data-single-subject]$ git config --unset remote.internal.annex-uuid
[kousu@requiem data-single-subject]$ git push --all internal
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github':
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github':
Initialized empty Git repository in /srv/git/repositories/datasets/data-single-subject.git/
Enumerating objects: 3169, done.
Counting objects: 100% (3169/3169), done.
Delta compression using up to 4 threads
Compressing objects: 100% (1593/1593), done.
Writing objects: 100% (3169/3169), 286.13 KiB | 14.31 MiB/s, done.
Total 3169 (delta 1570), reused 1979 (delta 829), pack-reused 0
remote: Resolving deltas: 100% (1570/1570), done.
To 132.207.65.204:datasets/data-single-subject.git
* [new branch] git-annex -> git-annex
* [new branch] master -> master
* [new branch] synced/master -> synced/master
[kousu@requiem data-single-subject]$ git config annex.sshcaching true # necessary to avoid
[kousu@requiem data-single-subject]$ time git annex copy --to internal
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github': I_r_labels-manual.nii.gz
(to internal...)
ok
copy derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (to internal...)
ok
copy derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (to internal...)
ok
copy sub-chiba750/anat/sub-chiba750_T1w.nii.gz (to internal...)
[...]
copy sub-unf/anat/sub-unf_T1w.nii.gz (to internal...)
ok
copy sub-unf/anat/sub-unf_T2star.nii.gz (to internal...)
ok
copy sub-unf/anat/sub-unf_T2w.nii.gz (to internal...)
ok
copy sub-unf/anat/sub-unf_acq-MToff_MTS.nii.gz (to internal...)
ok
copy sub-unf/anat/sub-unf_acq-MTon_MTS.nii.gz (to internal...)
ok
copy sub-unf/anat/sub-unf_acq-T1w_MTS.nii.gz (to internal...)
ok
copy sub-unf/dwi/sub-unf_dwi.nii.gz (to internal...)
ok
(recording state in git...)
real 12m34.268s
user 0m6.017s
sys 0m4.351s
And to dead
the repo:
[kousu@requiem data-single-subject]$ git annex whereis sub-unf/dwi/
whereis sub-unf/dwi/sub-unf_dwi.nii.gz (5 copies)
5ca3a9a5-ac75-410e-8dcd-8a24463f08fa -- julien@julien-macbook.local:~/code/spine-generic/data-single-subject
74ec6586-6ac2-4700-892e-56f55ac5544b
8aea80c3-2550-4340-8d36-42af5475c103 -- internal
c99162a2-3e7d-4100-82e7-1e077a0793f6 -- [amazon]
e80b53d8-6bf7-4996-a918-4c284c440217 -- kousu@requiem:~/src/neuropoly/datalad/data-single-subject [here]
amazon: https://data-single-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s2744719--1427176f437e7980602d3e8b355750df823f971e613acc3abcf1bc32b4944430.nii.gz
ok
Ah, there's actually my laptop and @jcohenadad's laptop listed here, in addition to the new repo (8aea80c3-2550-4340-8d36-42af5475c103) and the old one (74ec6586-6ac2-4700-892e-56f55ac5544b). Kill all of them:
[kousu@requiem data-single-subject]$ git annex dead here
dead here (recording state in git...)
ok
(recording state in git...)
[kousu@requiem data-single-subject]$ git annex dead 5ca3a9a5-ac75-410e-8dcd-8a24463f08fa
dead 5ca3a9a5-ac75-410e-8dcd-8a24463f08fa ok
(recording state in git...)
But, funny, when I try
[kousu@requiem data-single-subject]$ git annex dead 74ec6586-6ac2-4700-892e-56f55ac5544b
git-annex: there is no available git remote named "74ec6586-6ac2-4700-892e-56f55ac5544b"
Okay here's a very silly workaround:
[kousu@requiem data-single-subject]$ NEW=$(git config remote.internal.annex-uuid); echo $NEW
8aea80c3-2550-4340-8d36-42af5475c103
[kousu@requiem data-single-subject]$ git config remote.internal.annex-uuid 74ec6586-6ac2-4700-892e-56f55ac5544b
[kousu@requiem data-single-subject]$ git annex whereis sub-unf/dwi/
whereis sub-unf/dwi/sub-unf_dwi.nii.gz (3 copies)
74ec6586-6ac2-4700-892e-56f55ac5544b -- internal
8aea80c3-2550-4340-8d36-42af5475c103
c99162a2-3e7d-4100-82e7-1e077a0793f6 -- [amazon]
amazon: https://data-single-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s2744719--1427176f437e7980602d3e8b355750df823f971e613acc3abcf1bc32b4944430.nii.gz
ok
[kousu@requiem data-single-subject]$ git annex dead 74ec6586-6ac2-4700-892e-56f55ac5544b
dead 74ec6586-6ac2-4700-892e-56f55ac5544b ok
(recording state in git...)
[kousu@requiem data-single-subject]$ git config remote.internal.annex-uuid "$NEW"
[kousu@requiem data-single-subject]$ git annex whereis sub-unf/dwi/
whereis sub-unf/dwi/sub-unf_dwi.nii.gz (2 copies)
8aea80c3-2550-4340-8d36-42af5475c103 -- internal
c99162a2-3e7d-4100-82e7-1e077a0793f6 -- [amazon]
amazon: https://data-single-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s2744719--1427176f437e7980602d3e8b355750df823f971e613acc3abcf1bc32b4944430.nii.gz
ok
And a final sync up to catch all the metadata branches:
[kousu@requiem data-single-subject]$ git annex sync --content internal
commit
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
ok
pull internal
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github':
ok
push internal
Enumerating objects: 818, done.
Counting objects: 100% (818/818), done.
Delta compression using up to 4 threads
Compressing objects: 100% (300/300), done.
Writing objects: 100% (442/442), 32.26 KiB | 971.00 KiB/s, done.
Total 442 (delta 292), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (292/292), completed with 243 local objects.
To 132.207.65.204:datasets/data-single-subject.git
* [new branch] git-annex -> synced/git-annex
ok
Check that everything looks good:
[kousu@requiem data-single-subject]$ ssh git@data.neuro.polymtl.ca
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github':
PTY allocation request failed on channel 0
hello nguenther, this is git@data running gitolite3 3.6.11-2 (Debian) on git 2.27.0
R W C CREATOR/..*
R W C datasets/..*
R W datasets/data-single-subject
R W datasets/sct-testing-large
R W gitolite-admin
Connection to data.neuro.polymtl.ca closed.
And check that on the server side, the sizes are what I remember:
nguenther@data:~/datasets/sct-testing-large$ sudo -i -u git bash
[sudo] password for nguenther:
git@data:~$ cd repositories
git@data:~/repositories$ du -hs datasets/*
884M datasets/data-single-subject.git
19G datasets/sct-testing-large.git
git@data:~/repositories$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 1007G 20G 937G 2% /srv/git/repositories
See #22.
Jean-Sébastien Décarie has said he can give us access to the server's console; HyperV console are run over RDP (unlike QEMU's which are run over VNC); supposedly this command line will let a Linux user connect:
xfreerdp –ignore-certificate –no-nego -t 2179 -u $username –pcb $vmid $hypervhost
However, JS said he is not ready to grant us the rights to this: the HyperV network is not segregated enough. He thinks it is possible but it will take him some time to redesign, so we shouldn't hold our breath.
This was Polytechique Ticket/8478, but it's closed for now.
The only noticeable fallout from this seems to be, like with a bad force-push, everyone who had copies of the repos needs to acknowledge the glitch. With this, that means:
git config --unset remote.origin.annex-uuid
for each repo they had checked out. This was just me, Alex, Julien and Konstantinos, and I've walked all of them through fixing it.
I'll reopen if I find more glitches but I think this is done.
To better handle this in the future, we now have https://monitor.neuro.polymtl.ca (https://github.com/neuropoly/computers/issues/4).
2020-11-14
On November 14th the server data.neuro.polymtl.ca went down and did not come back up until December 2nd. I believe it was specifically 2am November 14th, the scheduled time for
unattended-upgrades
but I haven't totally confirmed that.Here are the last messages I received from the server; I'm not sure why there's two of them, they look like they're both part of the same upgrade:
After the reboot it was inaccessible.
2020-11-17
Here's a screenshot of the boot console (sorry for not transcribing it for accessibility)
Basically, it seems that /dev/sdb, the terabyte storage disk that was recently added to the system, has become corrupted or inaccessible.
Since I put this disk directly into /etc/fstab, that means the boot is now broken.
2020-12-02
Finally we were able to get together yesterday with Jean-Sébastien Décarie to investigate the server.
Recovering access
An immediate stumbling blockwas that no one knew the root password. The Ubuntu installer set up an account with sudo rights, and the root password was never recorded. In normal operation that's fine, maybe even desirable, but the systemd rescue shell insists on taking the root password.
init=/bin/bash
instead ofinit=/sbin/init
) but it (and variations on it) just led to a hung server. Jean-Sébastien found an Ubuntu-specific guide but I don't know the link and anyway it wasn't any more informative.sudo mkdir -p /mnt/root && sudo mount /dev/sda2 /mnt/root
vi /mnt/root/etc/fstab
#-> comment out the line for /srv/git/repositoriesEnsuring future access:
xkcdpass | pass insert root@data.neuro.polymtl.ca
(or equivalent password manager)sudo passwd root
and input the new password root@data.neuro.polymtl.caDebugging 1TB storage disk
Before moving on, I wanted to investigate what's wrong with the storage disk, to see if we can recover it and maybe understand what went wrong so we can avoid it.
One thing to note about this is the VM system is running on Microsoft HyperV, and the attached disk is a physical 1TB in passthrough mode, it's not a virtual disk.
That's not good :/
The partition table looks okay.
During that
fsck
attempt:badblocks
Read-only scan:
So reads are working, or at least doing something?
But yet:
The log is pretty noisy because, it seems, every single block is bad, i.e. it's not storing the data requested. In addition, some of them return I/O errors during write; we can focus on them like this:
I am suspicious. It seems like it's every 64th block that's giving an exception. I'll confirm that with this:
So, indeed, every single "Input/output error" line is on a specific boundary. Now, these aren't usual sized blocks. I followed what fdisk reported, and used
-b 32768
, which is the same as using 64x the usual 512-byte sized blocks, so that means these errors are actually happening every 32768B*64 = 2MiB.So every 2MiB the disk IO stack freaks out, and in between writes are silently failing.
I have to think this has something to do with combining Microsoft's HyperV hypervisor, the pass-through driver, and linux. Something in that stack is angry at the other parts. It is possible that the upgrade (maybe
linux-image-azure
?) is buggy with regard to the version of HyperV deployed at Polytechnique.I think the best solution is to not push HyperV that hard. Let's just switch to using a fully virtual storage disk and migrate to that, and make a backup server (#20).