neuropoly / data-management

Repo that deals with datalad aspects for internal use
4 stars 0 forks source link

data.neuro.polymtl.ca november 2020 outage #21

Closed kousu closed 3 years ago

kousu commented 3 years ago

2020-11-14

On November 14th the server data.neuro.polymtl.ca went down and did not come back up until December 2nd. I believe it was specifically 2am November 14th, the scheduled time for unattended-upgrades but I haven't totally confirmed that.

Here are the last messages I received from the server; I'm not sure why there's two of them, they look like they're both part of the same upgrade:

``` Return-Path: root@data.neuro.polymtl.ca Delivered-To: nick@kousu.ca Received: from data.neuro.polymtl.ca (donnees.neuro.polymtl.ca [132.207.65.204]) by comms.kousu.ca (OpenSMTPD) with ESMTPS id a5d0bd6b (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO) for ; Fri, 13 Nov 2020 11:36:36 +0000 (UTC) Received: from localhost (data.neuro.polymtl.ca [local]) by data.neuro.polymtl.ca (OpenSMTPD) with ESMTPA id bc0570e4 for ; Fri, 13 Nov 2020 11:36:35 +0000 (UTC) Date: Fri, 13 Nov 2020 06:36:35 -0500 (EST) Subject: unattended-upgrades result for data.neuro.polymtl.ca: SUCCESS From: root@data.neuro.polymtl.ca To: root@localhost Auto-Submitted: auto-generated MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: <8cc2d1dc25799b4d@data.neuro.polymtl.ca> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Unattended upgrade result: All upgrades installed Packages that were upgraded: apport intel-microcode libmaxminddb0 python3-apport python3-problem-report Package installation log: Log started: 2020-11-13 06:35:41 apt-listchanges: Reading changelogs... apt-listchanges: Reading changelogs... Preparing to unpack .../python3-problem-report_2.20.11-0ubuntu50.1_all.deb = ... Unpacking python3-problem-report (2.20.11-0ubuntu50.1) over (2.20.11-0ubunt= u50) ... Setting up python3-problem-report (2.20.11-0ubuntu50.1) ... Log ended: 2020-11-13 06:36:03 Log started: 2020-11-13 06:36:03 apt-listchanges: Reading changelogs... apt-listchanges: Reading changelogs... Preparing to unpack .../intel-microcode_3.20201110.0ubuntu0.20.10.2_amd64.d= eb ... Unpacking intel-microcode (3.20201110.0ubuntu0.20.10.2) over (3.20201110.0u= buntu0.20.10.1) ... Setting up intel-microcode (3.20201110.0ubuntu0.20.10.2) ... update-initramfs: deferring update (trigger activated) intel-microcode: microcode will be updated at next boot Processing triggers for initramfs-tools (0.137ubuntu12) ... update-initramfs: Generating /boot/initrd.img-5.8.0-1012-azure Log ended: 2020-11-13 06:36:18 Log started: 2020-11-13 06:36:18 apt-listchanges: Reading changelogs... apt-listchanges: Reading changelogs... Preparing to unpack .../libmaxminddb0_1.4.2-0ubuntu1.20.10.1_amd64.deb ... Unpacking libmaxminddb0:amd64 (1.4.2-0ubuntu1.20.10.1) over (1.4.2-0ubuntu1= ) ... Setting up libmaxminddb0:amd64 (1.4.2-0ubuntu1.20.10.1) ... Processing triggers for man-db (2.9.3-2) ... Processing triggers for libc-bin (2.32-0ubuntu3) ... Log ended: 2020-11-13 06:36:21 Log started: 2020-11-13 06:36:22 apt-listchanges: Reading changelogs... apt-listchanges: Reading changelogs... Preparing to unpack .../python3-apport_2.20.11-0ubuntu50.1_all.deb ... Unpacking python3-apport (2.20.11-0ubuntu50.1) over (2.20.11-0ubuntu50) ... Preparing to unpack .../apport_2.20.11-0ubuntu50.1_all.deb ... Unpacking apport (2.20.11-0ubuntu50.1) over (2.20.11-0ubuntu50) ... Setting up python3-apport (2.20.11-0ubuntu50.1) ... Setting up apport (2.20.11-0ubuntu50.1) ... apport-autoreport.service is a disabled or a static unit, not starting it. Processing triggers for systemd (246.6-1ubuntu1) ... Processing triggers for man-db (2.9.3-2) ... Processing triggers for ureadahead (0.100.0-21) ... Log ended: 2020-11-13 06:36:33 Unattended-upgrades log: Starting unattended upgrades script Allowed origins are: o=3DUbuntu,a=3Dgroovy, o=3DUbuntu,a=3Dgroovy-security,= o=3DUbuntuESMApps,a=3Dgroovy-apps-security, o=3DUbuntuESM,a=3Dgroovy-infra= -security, o=3DUbuntu,a=3Dgroovy-updates, o=3DUbuntu,a=3Dgroovy-backports Initial blacklist:=20 Initial whitelist (not strict):=20 Packages that will be upgraded: apport intel-microcode libmaxminddb0 python= 3-apport python3-problem-report Writing dpkg log to /var/log/unattended-upgrades/unattended-upgrades-dpkg.l= og All upgrades installed ```
``` Return-Path: root@data.neuro.polymtl.ca Delivered-To: nick@kousu.ca Received: from data.neuro.polymtl.ca (data.neuro.polymtl.ca [132.207.65.204]) by comms.kousu.ca (OpenSMTPD) with ESMTPS id 6e6ee5f1 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO) for ; Thu, 12 Nov 2020 11:40:24 +0000 (UTC) Received: from localhost (data.neuro.polymtl.ca [local]) by data.neuro.polymtl.ca (OpenSMTPD) with ESMTPA id a3c085a3 for ; Thu, 12 Nov 2020 11:40:23 +0000 (UTC) Date: Thu, 12 Nov 2020 06:40:23 -0500 (EST) Subject: [reboot required] unattended-upgrades result for data.neuro.polymtl.ca: SUCCESS From: root@data.neuro.polymtl.ca To: root@localhost Auto-Submitted: auto-generated MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: <04fe05da066a4360@data.neuro.polymtl.ca> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Unattended upgrade result: All upgrades installed Warning: A reboot is required to complete this upgrade, or a previous one. Packages that were upgraded: intel-microcode linux-azure linux-cloud-tools-azure linux-cloud-tools-common linux-headers-azure linux-image-azure linux-tools-azure linux-tools-common Package installation log: Log started: 2020-11-12 06:38:44 apt-listchanges: Reading changelogs... apt-listchanges: Reading changelogs... Selecting previously unselected package linux-modules-5.8.0-1012-azure. Preparing to unpack .../00-linux-modules-5.8.0-1012-azure_5.8.0-1012.13_amd= 64.deb ... Unpacking linux-modules-5.8.0-1012-azure (5.8.0-1012.13) ... Selecting previously unselected package linux-image-5.8.0-1012-azure. Preparing to unpack .../01-linux-image-5.8.0-1012-azure_5.8.0-1012.13_amd64= .deb ... Unpacking linux-image-5.8.0-1012-azure (5.8.0-1012.13) ... Preparing to unpack .../02-linux-azure_5.8.0.1012.12_amd64.deb ... Unpacking linux-azure (5.8.0.1012.12) over (5.8.0.1011.11) ... Preparing to unpack .../03-linux-image-azure_5.8.0.1012.12_amd64.deb ... Unpacking linux-image-azure (5.8.0.1012.12) over (5.8.0.1011.11) ... Selecting previously unselected package linux-azure-headers-5.8.0-1012. Preparing to unpack .../04-linux-azure-headers-5.8.0-1012_5.8.0-1012.13_all= .deb ... Unpacking linux-azure-headers-5.8.0-1012 (5.8.0-1012.13) ... Selecting previously unselected package linux-headers-5.8.0-1012-azure. Preparing to unpack .../05-linux-headers-5.8.0-1012-azure_5.8.0-1012.13_amd= 64.deb ... Unpacking linux-headers-5.8.0-1012-azure (5.8.0-1012.13) ... Preparing to unpack .../06-linux-headers-azure_5.8.0.1012.12_amd64.deb ... Unpacking linux-headers-azure (5.8.0.1012.12) over (5.8.0.1011.11) ... Selecting previously unselected package linux-azure-tools-5.8.0-1012. Preparing to unpack .../07-linux-azure-tools-5.8.0-1012_5.8.0-1012.13_amd64= .deb ... Unpacking linux-azure-tools-5.8.0-1012 (5.8.0-1012.13) ... Selecting previously unselected package linux-tools-5.8.0-1012-azure. Preparing to unpack .../08-linux-tools-5.8.0-1012-azure_5.8.0-1012.13_amd64= .deb ... Unpacking linux-tools-5.8.0-1012-azure (5.8.0-1012.13) ... Preparing to unpack .../09-linux-tools-azure_5.8.0.1012.12_amd64.deb ... Unpacking linux-tools-azure (5.8.0.1012.12) over (5.8.0.1011.11) ... Selecting previously unselected package linux-azure-cloud-tools-5.8.0-1012. Preparing to unpack .../10-linux-azure-cloud-tools-5.8.0-1012_5.8.0-1012.13= _amd64.deb ... Unpacking linux-azure-cloud-tools-5.8.0-1012 (5.8.0-1012.13) ... Selecting previously unselected package linux-cloud-tools-5.8.0-1012-azure. Preparing to unpack .../11-linux-cloud-tools-5.8.0-1012-azure_5.8.0-1012.13= _amd64.deb ... Unpacking linux-cloud-tools-5.8.0-1012-azure (5.8.0-1012.13) ... Preparing to unpack .../12-linux-cloud-tools-azure_5.8.0.1012.12_amd64.deb = ... Unpacking linux-cloud-tools-azure (5.8.0.1012.12) over (5.8.0.1011.11) ... Setting up linux-modules-5.8.0-1012-azure (5.8.0-1012.13) ... Setting up linux-azure-cloud-tools-5.8.0-1012 (5.8.0-1012.13) ... Setting up linux-azure-headers-5.8.0-1012 (5.8.0-1012.13) ... Setting up linux-azure-tools-5.8.0-1012 (5.8.0-1012.13) ... Setting up linux-image-5.8.0-1012-azure (5.8.0-1012.13) ... I: /boot/vmlinuz is now a symlink to vmlinuz-5.8.0-1012-azure I: /boot/initrd.img is now a symlink to initrd.img-5.8.0-1012-azure Setting up linux-cloud-tools-5.8.0-1012-azure (5.8.0-1012.13) ... Setting up linux-headers-5.8.0-1012-azure (5.8.0-1012.13) ... Setting up linux-tools-5.8.0-1012-azure (5.8.0-1012.13) ... Setting up linux-headers-azure (5.8.0.1012.12) ... Setting up linux-image-azure (5.8.0.1012.12) ... Setting up linux-tools-azure (5.8.0.1012.12) ... Setting up linux-cloud-tools-azure (5.8.0.1012.12) ... Setting up linux-azure (5.8.0.1012.12) ... Processing triggers for linux-image-5.8.0-1012-azure (5.8.0-1012.13) ... /etc/kernel/postinst.d/initramfs-tools: update-initramfs: Generating /boot/initrd.img-5.8.0-1012-azure /etc/kernel/postinst.d/zz-update-grub: Sourcing file `/etc/default/grub' Sourcing file `/etc/default/grub.d/init-select.cfg' Generating grub configuration file ... Found linux image: /boot/vmlinuz-5.8.0-1012-azure Found initrd image: /boot/initrd.img-5.8.0-1012-azure Found linux image: /boot/vmlinuz-5.8.0-1011-azure Found initrd image: /boot/initrd.img-5.8.0-1011-azure Adding boot menu entry for UEFI Firmware Settings done [master 3c027c9] committing changes in /etc made by "/usr/bin/python3 /usr/= bin/unattended-upgrade" 1 file changed, 37 insertions(+), 55 deletions(-) rewrite apt/apt.conf.d/01autoremove-kernels (83%) Log ended: 2020-11-12 06:39:44 Log started: 2020-11-12 06:39:45 apt-listchanges: Reading changelogs... apt-listchanges: Reading changelogs... Preparing to unpack .../linux-cloud-tools-common_5.8.0-28.30_all.deb ... Unpacking linux-cloud-tools-common (5.8.0-28.30) over (5.8.0-26.27) ... Setting up linux-cloud-tools-common (5.8.0-28.30) ... Processing triggers for ureadahead (0.100.0-21) ... Processing triggers for man-db (2.9.3-2) ... Log ended: 2020-11-12 06:39:56 Log started: 2020-11-12 06:39:57 apt-listchanges: Reading changelogs... apt-listchanges: Reading changelogs... Preparing to unpack .../intel-microcode_3.20201110.0ubuntu0.20.10.1_amd64.d= eb ... Unpacking intel-microcode (3.20201110.0ubuntu0.20.10.1) over (3.20200609.0u= buntu0.20.04.2) ... Setting up intel-microcode (3.20201110.0ubuntu0.20.10.1) ... update-initramfs: deferring update (trigger activated) intel-microcode: microcode will be updated at next boot Processing triggers for initramfs-tools (0.137ubuntu12) ... update-initramfs: Generating /boot/initrd.img-5.8.0-1012-azure Log ended: 2020-11-12 06:40:07 Log started: 2020-11-12 06:40:08 apt-listchanges: Reading changelogs... apt-listchanges: Reading changelogs... Preparing to unpack .../linux-tools-common_5.8.0-28.30_all.deb ... Unpacking linux-tools-common (5.8.0-28.30) over (5.8.0-26.27) ... Setting up linux-tools-common (5.8.0-28.30) ... Processing triggers for man-db (2.9.3-2) ... Log ended: 2020-11-12 06:40:22 Unattended-upgrades log: Starting unattended upgrades script Allowed origins are: o=3DUbuntu,a=3Dgroovy, o=3DUbuntu,a=3Dgroovy-security,= o=3DUbuntuESMApps,a=3Dgroovy-apps-security, o=3DUbuntuESM,a=3Dgroovy-infra= -security, o=3DUbuntu,a=3Dgroovy-updates, o=3DUbuntu,a=3Dgroovy-backports Initial blacklist:=20 Initial whitelist (not strict):=20 Packages that will be upgraded: intel-microcode linux-azure linux-cloud-too= ls-azure linux-cloud-tools-common linux-headers-azure linux-image-azure lin= ux-tools-azure linux-tools-common Writing dpkg log to /var/log/unattended-upgrades/unattended-upgrades-dpkg.l= og All upgrades installed ```

After the reboot it was inaccessible.

2020-11-17

Here's a screenshot of the boot console (sorry for not transcribing it for accessibility)

image003

Basically, it seems that /dev/sdb, the terabyte storage disk that was recently added to the system, has become corrupted or inaccessible.

Since I put this disk directly into /etc/fstab, that means the boot is now broken.

2020-12-02

Finally we were able to get together yesterday with Jean-Sébastien Décarie to investigate the server.

Recovering access

An immediate stumbling blockwas that no one knew the root password. The Ubuntu installer set up an account with sudo rights, and the root password was never recorded. In normal operation that's fine, maybe even desirable, but the systemd rescue shell insists on taking the root password.

  1. Attempted to follow https://linuxconfig.org/recover-reset-forgotten-linux-root-password (which recommends to boot with init=/bin/bash instead of init=/sbin/init) but it (and variations on it) just led to a hung server. Jean-Sébastien found an Ubuntu-specific guide but I don't know the link and anyway it wasn't any more informative.
  2. Boot with Ubuntu installer .iso that was used to install the system in the first place.
  3. Open a Terminal
  4. sudo mkdir -p /mnt/root && sudo mount /dev/sda2 /mnt/root
  5. vi /mnt/root/etc/fstab #-> comment out the line for /srv/git/repositories
  6. Reboot

Ensuring future access:

  1. Locally: xkcdpass | pass insert root@data.neuro.polymtl.ca (or equivalent password manager)
  2. Remotely: sudo passwd root and input the new password root@data.neuro.polymtl.ca
  3. Give the root password to @jcohenadad
  4. Give the root password to @alexfoias

Debugging 1TB storage disk

Before moving on, I wanted to investigate what's wrong with the storage disk, to see if we can recover it and maybe understand what went wrong so we can avoid it.

One thing to note about this is the VM system is running on Microsoft HyperV, and the attached disk is a physical 1TB in passthrough mode, it's not a virtual disk.

  1. Basic reconnnaissance
root@data:/home/nguenther# mount /dev/sdb1 /srv/git/repositories
mount: /srv/git/repositories: can't read superblock on /dev/sdb1.

That's not good :/

root@data:/home/nguenther# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.36).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): p
Disk /dev/sdb: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Disk model: 2145            
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 32768 bytes
Disklabel type: gpt
Disk identifier: F39B8299-4E4E-8C4B-96BF-758F07539380

Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 2147483614 2147481567 1024G Linux filesystem

The partition table looks okay.

root@data:/home/nguenther# e2fsck /dev/sdb1
e2fsck 1.45.6 (20-Mar-2020)
neuropoly-data: recovering journal
e2fsck: Input/output error while recovering journal of neuropoly-data
e2fsck: unable to set superblock flags on neuropoly-data

neuropoly-data: ********** WARNING: Filesystem still has errors **********
  1. Digging into errors

During that fsck attempt:

root@data:/home/nguenther# dmesg
[...]
[  798.481918] blk_update_request: I/O error, dev sdb, sector 2888 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
[  798.482003] buffer_io_error: 2697 callbacks suppressed
[  798.482006] Buffer I/O error on dev sdb1, logical block 840, lost async page write
[  798.482064] Buffer I/O error on dev sdb1, logical block 841, lost async page write
[  798.482120] Buffer I/O error on dev sdb1, logical block 842, lost async page write
[  798.482196] Buffer I/O error on dev sdb1, logical block 843, lost async page write
[  798.482252] Buffer I/O error on dev sdb1, logical block 844, lost async page write
[  798.482318] Buffer I/O error on dev sdb1, logical block 845, lost async page write
[  798.482375] Buffer I/O error on dev sdb1, logical block 846, lost async page write
[  798.482430] Buffer I/O error on dev sdb1, logical block 847, lost async page write
[  798.482505] sd 0:0:0:2: [sdb] tag#311 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.482509] sd 0:0:0:2: [sdb] tag#311 CDB: Write(10) 2a 00 0d 80 09 00 00 00 08 00
[  798.482512] blk_update_request: I/O error, dev sdb, sector 226494720 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
[  798.482589] Buffer I/O error on dev sdb1, logical block 226492672, lost async page write
[  798.482647] Buffer I/O error on dev sdb1, logical block 226492673, lost async page write
[  798.482716] sd 0:0:0:2: [sdb] tag#310 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.482719] sd 0:0:0:2: [sdb] tag#310 CDB: Write(10) 2a 00 00 01 2d 08 00 00 08 00
[  798.482722] blk_update_request: I/O error, dev sdb, sector 77064 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
[  798.482806] sd 0:0:0:2: [sdb] tag#309 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.482809] sd 0:0:0:2: [sdb] tag#309 CDB: Write(10) 2a 00 00 00 09 08 00 00 08 00
[  798.482811] blk_update_request: I/O error, dev sdb, sector 2312 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
[  798.482956] sd 0:0:0:2: [sdb] tag#308 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.482959] sd 0:0:0:2: [sdb] tag#308 CDB: Write(10) 2a 00 0d c0 1e a0 00 04 00 00
[  798.482962] blk_update_request: I/O error, dev sdb, sector 230694560 op 0x1:(WRITE) flags 0x4800 phys_seg 1024 prio class 0
[  798.483564] sd 0:0:0:2: [sdb] tag#307 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.483567] sd 0:0:0:2: [sdb] tag#307 CDB: Write(10) 2a 00 0d c0 0e a0 00 04 00 00
[  798.483569] blk_update_request: I/O error, dev sdb, sector 230690464 op 0x1:(WRITE) flags 0x4800 phys_seg 1024 prio class 0
[  798.484169] sd 0:0:0:2: [sdb] tag#305 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.484171] sd 0:0:0:2: [sdb] tag#305 CDB: Write(10) 2a 00 0d c0 12 a0 00 04 00 00
[  798.484174] blk_update_request: I/O error, dev sdb, sector 230691488 op 0x1:(WRITE) flags 0x4800 phys_seg 1024 prio class 0
[  798.484787] sd 0:0:0:2: [sdb] tag#306 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.484789] sd 0:0:0:2: [sdb] tag#306 CDB: Write(10) 2a 00 0d c0 16 a0 00 04 00 00
[  798.484792] blk_update_request: I/O error, dev sdb, sector 230692512 op 0x1:(WRITE) flags 0x4800 phys_seg 1024 prio class 0
[  798.485384] sd 0:0:0:2: [sdb] tag#302 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.485387] sd 0:0:0:2: [sdb] tag#302 CDB: Write(10) 2a 00 0d c0 1a a0 00 04 00 00
[  798.485389] blk_update_request: I/O error, dev sdb, sector 230693536 op 0x1:(WRITE) flags 0x4800 phys_seg 1024 prio class 0
[  798.485962] sd 0:0:0:2: [sdb] tag#303 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[  798.485965] sd 0:0:0:2: [sdb] tag#303 CDB: Write(10) 2a 00 00 00 08 c8 00 00 08 00
[  798.485968] blk_update_request: I/O error, dev sdb, sector 2248 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
[...]
  1. Scanning with badblocks

Read-only scan:

root@data:~# time badblocks -sv -b 32768 /dev/sdb
Checking blocks 0 to 33554431
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

real    18m0.506s
user    0m2.672s
sys 1m25.659s

So reads are working, or at least doing something?

But yet:

root@data:~# e2fsck /dev/sdb1
e2fsck 1.45.6 (20-Mar-2020)
neuropoly-data: recovering journal
Superblock needs_recovery flag is clear, but journal has data.
Run journal anyway<y>? yes
e2fsck: Input/output error while recovering journal of neuropoly-data
e2fsck: unable to set superblock flags on neuropoly-data

neuropoly-data: ********** WARNING: Filesystem still has errors **********
root@data:~# dmesg
[...]
[ 2233.819080] buffer_io_error: 21662 callbacks suppressed
[ 2233.819091] Buffer I/O error on dev sdb1, logical block 1069809664, lost async page write
[ 2233.819103] Buffer I/O error on dev sdb1, logical block 1069809665, lost async page write
[ 2233.819116] Buffer I/O error on dev sdb1, logical block 1069809666, lost async page write
[ 2233.819129] Buffer I/O error on dev sdb1, logical block 1069809667, lost async page write
[ 2233.819142] Buffer I/O error on dev sdb1, logical block 1069809668, lost async page write
[ 2233.819154] Buffer I/O error on dev sdb1, logical block 1069809669, lost async page write
[ 2233.819167] Buffer I/O error on dev sdb1, logical block 1069809670, lost async page write
[ 2233.819179] Buffer I/O error on dev sdb1, logical block 1069809671, lost async page write
[ 2233.819354] sd 0:0:0:2: [sdb] tag#117 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 2233.819369] sd 0:0:0:2: [sdb] tag#117 CDB: Write(10) 2a 00 00 00 08 08 00 04 00 00
[ 2233.819385] blk_update_request: I/O error, dev sdb, sector 2056 op 0x1:(WRITE) flags 0x800 phys_seg 1024 prio class 0
[ 2233.819398] Buffer I/O error on dev sdb1, logical block 8, lost async page write
[ 2233.819412] Buffer I/O error on dev sdb1, logical block 9, lost async page write
[ 2234.070920] sd 0:0:0:2: [sdb] tag#89 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[ 2234.070958] sd 0:0:0:2: [sdb] tag#89 CDB: Write(10) 2a 00 3f c4 08 00 00 00 08 00
[ 2234.070972] blk_update_request: I/O error, dev sdb, sector 1069811712 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
root@data:~# time badblocks -svn -b 32768 /dev/sdb 2>&1 | tee ~/dev-sdb2-badblocks-n.log
Checking for bad blocks in non-destructive read-write mode
From block 0 to 33554431
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: badblocks: Input/output error during test data write, block 0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[...]
badblocks: Input/output error during test data write, block 64
64
65
66
67
68
69
70
71
72
73
74
[...]
119
120
121
122
123
124
125
126
127
badblocks: Input/output error during test data write, block 128
128
129
130
131
132
[...]
33554430
33554431
done                                                 
Pass completed, 33554432 bad blocks found. (0/0/33554432 errors)

real    1612m28.915s
user    1m29.985s
sys 10m56.125s

The log is pretty noisy because, it seems, every single block is bad, i.e. it's not storing the data requested. In addition, some of them return I/O errors during write; we can focus on them like this:

root@data:~# cat dev-sdb2-badblocks-n.log | egrep 'Input/output error .* block [[:digit:]]+'
Testing with random pattern: badblocks: Input/output error during test data write, block 0
badblocks: Input/output error during test data write, block 64
badblocks: Input/output error during test data write, block 128
badblocks: Input/output error during test data write, block 192
badblocks: Input/output error during test data write, block 256
badblocks: Input/output error during test data write, block 320
badblocks: Input/output error during test data write, block 384
badblocks: Input/output error during test data write, block 448
badblocks: Input/output error during test data write, block 512
badblocks: Input/output error during test data write, block 576
badblocks: Input/output error during test data write, block 640
badblocks: Input/output error during test data write, block 704
badblocks: Input/output error during test data write, block 768
badblocks: Input/output error during test data write, block 832
badblocks: Input/output error during test data write, block 896
badblocks: Input/output error during test data write, block 960
[...]

I am suspicious. It seems like it's every 64th block that's giving an exception. I'll confirm that with this:

root@data:~# cat dev-sdb2-badblocks-n.log | egrep 'Input/output error .* block [[:digit:]]+' | egrep -o '[[:digit:]]+$' | while read block; do echo $(($block % 64)); done | sort | uniq -c
 524268 0

So, indeed, every single "Input/output error" line is on a specific boundary. Now, these aren't usual sized blocks. I followed what fdisk reported, and used -b 32768, which is the same as using 64x the usual 512-byte sized blocks, so that means these errors are actually happening every 32768B*64 = 2MiB.

So every 2MiB the disk IO stack freaks out, and in between writes are silently failing.

I have to think this has something to do with combining Microsoft's HyperV hypervisor, the pass-through driver, and linux. Something in that stack is angry at the other parts. It is possible that the upgrade (maybe linux-image-azure?) is buggy with regard to the version of HyperV deployed at Polytechnique.

I think the best solution is to not push HyperV that hard. Let's just switch to using a fully virtual storage disk and migrate to that, and make a backup server (#20).

kousu commented 3 years ago

2020-12-02 Prevention

According to archwiki, we can avoid this in the future by appending

noauto,x-systemd.automount

to the /srv/git/repositories line in /etc/fstab. This will enable the server to boot even if the storage disk messes up.

An older alternative to the same thing is autofs. I am skeptical about systemd in general, but in this case I think it is the simpler solution.

(EDIT: this is just a systemd wrapper around autofs; still, it is the simpler option)

2020-12-07 Repairing

The server broke due to some kind of nasty driver bug with the storage disk: https://github.com/neuropoly/datalad/issues/21

We now have a new 1TB disk, currently at /dev/sdc. I've been promised that is an expandable virtual disk, so that size is just a quota and not a hard upper limit, so I am going to replace the old storage disk with this one. When it runs out of space, we'll have to ask Jean-Sébastien to resize it, and then run e2resize to get access to the extra space (that is, until we hit linux filesystem limits, but that's for a later date).

Provisioning the new disk

root@data:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0    7:0    0  97.7M  1 loop /snap/core/10185
loop1    7:1    0  97.9M  1 loop /snap/core/10444
loop2    7:2    0  55.3M  1 loop /snap/core18/1885
loop3    7:3    0  55.4M  1 loop /snap/core18/1932
sda      8:0    0   127G  0 disk 
├─sda1   8:1    0   512M  0 part /boot/efi
└─sda2   8:2    0 126.5G  0 part /
sdb      8:16   0     1T  0 disk 
└─sdb1   8:17   0  1024G  0 part 
sdc      8:32   0     1T  0 disk 
sr0     11:0    1  1024M  0 rom  
root@data:~# fdisk /dev/sdc

Welcome to fdisk (util-linux 2.36).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xb5c77a8b.

Command (m for help): p

Disk /dev/sdc: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Disk model: Virtual Disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xb5c77a8b

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): 

Using default response p.
Partition number (1-4, default 1): 
First sector (2048-2147483647, default 2048): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-2147483647, default 2147483647): 

Created a new partition 1 of type 'Linux' and of size 1024 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@data:~# mkfs.ext4 -L "neuropoly-data" /dev/sdc1
mke2fs 1.45.6 (20-Mar-2020)
Discarding device blocks: done                            
Creating filesystem with 268435200 4k blocks and 67108864 inodes
Filesystem UUID: d6d8c87e-fe67-4739-b44d-98f88f243364
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done     

Open up /etc/fstab to reenable the mount, swapping the new filesystem in; additionally, include the preventative measure mentioned above:

root@data:~# vi /etc/fstab 
root@data:~# cat /etc/fstab 
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda2 during installation
UUID=49cd31d6-7a4f-476a-80ba-2631bbb6a12a /               ext4    errors=remount-ro 0       1
# /boot/efi was on /dev/sda1 during installation
UUID=217B-52E8  /boot/efi       vfat    umask=0077      0       1
/swapfile                                 none            swap    sw              0       0

# datasets
UUID=d6d8c87e-fe67-4739-b44d-98f88f243364 /srv/git/repositories ext4 errors=remount-ro,noauto,x-systemd.automount 0 1

Fix up the top-level filesystem permissions:

root@data:~# mount /srv/git/repositories
root@data:~# chown -R git:git /srv/git/repositories

Deal with the "lost+found" glitch in the same bad bad bad incomplete way I did before:

root@data:~# rmdir /srv/git/repositories/lost+found/

Reboot at this point to make sure it takes:

root@data:~# reboot

When logged back in, at first /srv/git/repositories is not mounted, but as soon as something touches it it shows up:

git@data:~$ mount
[...]
/dev/sda2 on / type ext4 (rw,relatime,errors=remount-ro)
[...]
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=809468k,nr_inodes=202367,mode=700,uid=1001,gid=1001)
git@data:~$ ls -la repositories
total 8
drwxr-xr-x  2 git git 4096 Dec  8 07:12 .
drwxr-xr-x 11 git git 4096 Dec  8 04:25 ..
git@data:~$ mount
[...]
/dev/sda2 on / type ext4 (rw,relatime,errors=remount-ro)
[...]
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=809468k,nr_inodes=202367,mode=700,uid=1001,gid=1001)
/dev/sdc1 on /srv/git/repositories type ext4 (rw,relatime,errors=remount-ro,x-systemd.automount)

Recovering data

The main problem is I hadn't set up backups yet (#20). We hadn't put much data in yet, so the thought didn't cross my mind. Egg on my face. I am embarrassed.

Surveying what I have:

So here's how to get it back together:

  1. Copy in gitolite-admin from the most recent copy I have.
  2. Take ~git/.gitolite/keydir/konstantinos.pub and paste it back into ssh git@... keys add
  3. Take ~git/.gitolite/keydir/konstantinos@acheron.pub and paste it back into ssh git@... keys add
  4. Reupload the big dataset: (cd ~nguenther/datasets/large && git push origin && git annex copy --to origin)
  5. Reupload the smaller dataset: (cd ~/src/neuropoly/datalad/data-single-subject && git push internal && git annex copy --to internal)

First, save the missing pubkeys for later in case I wipe them out. If I do, I can recover them by asking Konstantinos for them, but hopefully I don't have to bother him:

nguenther@data:~/gitolite-admin$ sudo -i -u git bash
git@data:~$ cat .gitolite/keydir/konstantinos@acheron.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDUyn5vweHYvJnQcwu79yiRHwS+ZY2HcD1HShP4xJ1gKHMXlhlVbZ6gx2yYyRV6eOdKOIplyNPw5zOjd8pXYsjMLtZru2brLDNoynzwFqJY8VqfZRVhHKQnnU056dtT16Qp2u+DfeOvJhANYiSlnrMV0W+/nup4PoiWarseOPNySdeBo80k/oWJLp8kn9kXTemIa3ZOtNLWFWN4kxyVIA5F5l7rIzmpaBRjx8TuibP9afQKFLDw3vfNBEFzc0/oCYE6GWApvoxwfnP4AIHjL5WZ8TDy9I5RrlNCxhBxRVau4WXhOvAj58IiB/9I1Hi2g178qc9dTBYx0GM1Cbg7RWQsEdua6qabdE2L2wG3oPoQmfcQqtrRsFW5nfOOZ3U8hSk9YX/hlpa2y68EyC/+x2Yt9irDG6mGgfyIY3T8dhGerMgZ9BOOpVwuzVZiLrpZnPJc8kljdaiwS4Olqo9jh5FO/k7U9is56ODFyTGuQqW1H8O2BkIJtAva5E6xOTtWmwE=
git@data:~$ cat .gitolite/keydir/konstantinos.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCi8Fdy9RMf+pwQLW6h5dGRMKbnsRc2JVTC5upLdnms7cUQjJE/6sBoHbgQF9BusgYdvug8qR/HacJeEnITpndT1o2ddGVXWZdxYrBC3yRUHECUwL0oib3WgfKkYv3XfeUZgLHZTTIyUNNB44JXiVpJljBE0OImbt2tg8Lp3JNoRXlYR4iH6973BMyA0hi8aG1ubHxPxL+13NZXP81CZLI6w9s5KiQENKkF4AcmoSm8A5HDoi9Ea2YcqxwIn7jz1VryROFNoRNBT5+7ldw3GAHXl4uGoji9rfUlXSHKLsxA4ZG3lum6jVgMz9Wpe0uIYLbpa0g8V0Yzr9TkjVWR9IQZ u111358@rosenberg

Now, reinit the core repo. I think the spare copy I had on the server is pretty good, it's from November 10th, the outage was from November 14th, so not very much could have happened in between:

nguenther@data:~$ cd gitolite-admin/
nguenther@data:~/gitolite-admin$ pwd
/home/nguenther/gitolite-admin
nguenther@data:~/gitolite-admin$ git log -n 1 master
commit 3b3ad80bf8c20e424b8d345c46c5a519401cc79b (HEAD -> master, origin/master, origin/HEAD)
Author: git on data.neuro.polymtl.ca <git@data.neuro.polymtl.ca>
Date:   Tue Nov 10 14:39:40 2020 -0500

    keys: add nguenther@server (SHA256:YjIcdy0fnALMCfT8YEx7x6eexXyugvvfuHKRCRT48vA)
nguenther@data:~/gitolite-admin$ git push --all origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
FATAL: W any gitolite-admin nguenther DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

hm. I guess it needs at least an empty repo on the other side. Oh yes, that makes sense; gitolite-admin is not a wildrepo, so it won't be created on a push.

Two steps then: a) Recover from the earlier backup, b) push the more recent backup.

nguenther@data:~/gitolite-admin$ sudo -i -u git bash
git@data:~$ rsync -av repositories.bak/gitolite-admin.git repositories/
sending incremental file list
gitolite-admin.git/
gitolite-admin.git/COMMIT_EDITMSG
gitolite-admin.git/HEAD
gitolite-admin.git/config
gitolite-admin.git/description
gitolite-admin.git/gl-conf
gitolite-admin.git/index
gitolite-admin.git/branches/
gitolite-admin.git/hooks/
gitolite-admin.git/hooks/applypatch-msg.sample
gitolite-admin.git/hooks/commit-msg.sample
gitolite-admin.git/hooks/fsmonitor-watchman.sample
gitolite-admin.git/hooks/post-update -> /srv/git/.gitolite/hooks/gitolite-admin/post-update
gitolite-admin.git/hooks/post-update.sample
gitolite-admin.git/hooks/pre-applypatch.sample
gitolite-admin.git/hooks/pre-commit.sample
gitolite-admin.git/hooks/pre-merge-commit.sample
gitolite-admin.git/hooks/pre-push.sample
gitolite-admin.git/hooks/pre-rebase.sample
gitolite-admin.git/hooks/pre-receive.sample
gitolite-admin.git/hooks/prepare-commit-msg.sample
gitolite-admin.git/hooks/update -> /srv/git/.gitolite/hooks/common/update
gitolite-admin.git/hooks/update.sample
gitolite-admin.git/info/
gitolite-admin.git/info/exclude
gitolite-admin.git/logs/
gitolite-admin.git/logs/HEAD
gitolite-admin.git/logs/refs/
gitolite-admin.git/logs/refs/heads/
gitolite-admin.git/logs/refs/heads/master
gitolite-admin.git/objects/
gitolite-admin.git/objects/03/
gitolite-admin.git/objects/03/66efe7cb67b68f1830d99a67ae65166d6c3471
gitolite-admin.git/objects/06/
gitolite-admin.git/objects/06/7bb96fd8d465342925fae7de0fd2856a8440f8
gitolite-admin.git/objects/0a/
gitolite-admin.git/objects/0a/d7680237d632cb2dbca04964f5d19ceba72089
gitolite-admin.git/objects/1c/
gitolite-admin.git/objects/1c/bd348fa9ea330f1ab3f6aaec0019d3bafe05a5
gitolite-admin.git/objects/1d/
gitolite-admin.git/objects/1d/58f0b8835b052819efc21435c1eee0afbdbeb6
gitolite-admin.git/objects/3b/
gitolite-admin.git/objects/3b/f64657d682cfa84a1dfbd8847810936281de60
gitolite-admin.git/objects/3e/
gitolite-admin.git/objects/3e/93a6d538cdaef6a30a8a50f2074eed75c76679
gitolite-admin.git/objects/45/
gitolite-admin.git/objects/45/7f2d687c0fcf2091cbd231bed079b3236490d1
gitolite-admin.git/objects/46/
gitolite-admin.git/objects/46/5c1a35ecfb7b2ea2ddc6d55cb3df3ca772c8c6
gitolite-admin.git/objects/55/
gitolite-admin.git/objects/55/36d7abf29963dc27b82ad2c2c9de6aa7658d68
gitolite-admin.git/objects/5d/
gitolite-admin.git/objects/5d/ef6d56a44cb84f0f058d1ff5af8fc3022bc82d
gitolite-admin.git/objects/62/
gitolite-admin.git/objects/62/6538ba785086bd04aa612db2b6135cd3cd8d1a
gitolite-admin.git/objects/67/
gitolite-admin.git/objects/67/e1ce6ec686f106d660f174a3ed8f34bbaa731f
gitolite-admin.git/objects/6b/
gitolite-admin.git/objects/6b/3d7a4d6396b2b97d90526d7f2d70c991feeecc
gitolite-admin.git/objects/6b/910bb5a57aa5558a241850ecdd091bce0f6dcd
gitolite-admin.git/objects/6c/
gitolite-admin.git/objects/6c/0fff4de3e1a4ed04ae777f442c99ecee292420
gitolite-admin.git/objects/6c/e13516a8668d9df3e9dd93518dd3d59a0396a7
gitolite-admin.git/objects/70/
gitolite-admin.git/objects/70/7cbfcdb11ad51df0ea7b06a7b16336326b8f16
gitolite-admin.git/objects/71/
gitolite-admin.git/objects/71/a148807b0c0c1e5ef07a344c3783b5c9fbeb95
gitolite-admin.git/objects/79/
gitolite-admin.git/objects/79/d00243ce64b347092bdf7c19d3bc2be2d89fc5
gitolite-admin.git/objects/7f/
gitolite-admin.git/objects/7f/6f3ef81ff6f5e216fe04b794393e26a4d8e482
gitolite-admin.git/objects/90/
gitolite-admin.git/objects/90/323dfc76c2ec44592372a8fd010214d8cd4285
gitolite-admin.git/objects/96/
gitolite-admin.git/objects/96/447717c453e4269f226ac24dd49b4c76099393
gitolite-admin.git/objects/9d/
gitolite-admin.git/objects/9d/5bdfa9dee355bc475022fa5f3e5555a4c20747
gitolite-admin.git/objects/9e/
gitolite-admin.git/objects/9e/64c62987ebfcb0ccd03bfa675d445bf0d912a2
gitolite-admin.git/objects/a8/
gitolite-admin.git/objects/a8/f2963509fd9cc3f808dea9be3e91a9b6d76a75
gitolite-admin.git/objects/ac/
gitolite-admin.git/objects/ac/cb71aaff29e46ca82a2b723ec113c78930a093
gitolite-admin.git/objects/af/
gitolite-admin.git/objects/af/b338306a84dc348494f012327183bbd048a050
gitolite-admin.git/objects/b2/
gitolite-admin.git/objects/b2/7e2bbc1e419146c32261cadb90ba00c59fd6d6
gitolite-admin.git/objects/c1/
gitolite-admin.git/objects/c1/973eb6f81cf617d41171af7848d400a6999fdd
gitolite-admin.git/objects/d5/
gitolite-admin.git/objects/d5/1af9c42b41f45722a518037fd13457ea64ed9e
gitolite-admin.git/objects/d6/
gitolite-admin.git/objects/d6/17788e10970e41e53ae258e9080a2e4d43a779
gitolite-admin.git/objects/dc/
gitolite-admin.git/objects/dc/cfb341c43fd7e00c2eb17722c45fff338de09c
gitolite-admin.git/objects/e9/
gitolite-admin.git/objects/e9/c605c361a863bc2ec65aa583ed45c167f16758
gitolite-admin.git/objects/ea/
gitolite-admin.git/objects/ea/9d22a721be3d6298bdac9219a96ceaedd905a5
gitolite-admin.git/objects/fc/
gitolite-admin.git/objects/fc/640bb3e66e57bf2e72545623bbe6c4f9ba0285
gitolite-admin.git/objects/info/
gitolite-admin.git/objects/pack/
gitolite-admin.git/refs/
gitolite-admin.git/refs/heads/
gitolite-admin.git/refs/heads/master
gitolite-admin.git/refs/tags/

sent 38,288 bytes  received 1,320 bytes  79,216.00 bytes/sec
total size is 31,802  speedup is 0.80
git@data:~$ exit
nguenther@data:~/gitolite-admin$ git push --all origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 533 bytes | 106.00 KiB/s, done.
Total 4 (delta 1), reused 3 (delta 0), pack-reused 0
To data.neuro.polymtl.ca:gitolite-admin
   9d5bdfa..3b3ad80  master -> master

Test:

nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca info
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
hello nguenther, this is git@data running gitolite3 3.6.11-2 (Debian) on git 2.27.0

 R W C  CREATOR/..*
 R W C  datasets/..*
 R W    gitolite-admin

Re-add Konstantinos:

nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys add konstantinos
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDUyn5vweHYvJnQcwu79yiRHwS+ZY2HcD1HShP4xJ1gKHMXlhlVbZ6gx2yYyRV6eOdKOIplyNPw5zOjd8pXYsjMLtZru2brLDNoynzwFqJY8VqfZRVhHKQnnU056dtT16Qp2u+DfeOvJhANYiSlnrMV0W+/nup4PoiWarseOPNySdeBo80k/oWJLp8kn9kXTemIa3ZOtNLWFWN4kxyVIA5F5l7rIzmpaBRjx8TuibP9afQKFLDw3vfNBEFzc0/oCYE6GWApvoxwfnP4AIHjL5WZ8TDy9I5RrlNCxhBxRVau4WXhOvAj58IiB/9I1Hi2g178qc9dTBYx0GM1Cbg7RWQsEdua6qabdE2L2wG3oPoQmfcQqtrRsFW5nfOOZ3U8hSk9YX/hlpa2y68EyC/+x2Yt9irDG6mGgfyIY3T8dhGerMgZ9BOOpVwuzVZiLrpZnPJc8kljdaiwS4Olqo9jh5FO/k7U9is56ODFyTGuQqW1H8O2BkIJtAva5E6xOTtWmwE=
^D
Added SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys add konstantinos@rosenberg
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCi8Fdy9RMf+pwQLW6h5dGRMKbnsRc2JVTC5upLdnms7cUQjJE/6sBoHbgQF9BusgYdvug8qR/HacJeEnITpndT1o2ddGVXWZdxYrBC3yRUHECUwL0oib3WgfKkYv3XfeUZgLHZTTIyUNNB44JXiVpJljBE0OImbt2tg8Lp3JNoRXlYR4iH6973BMyA0hi8aG1ubHxPxL+13NZXP81CZLI6w9s5KiQENKkF4AcmoSm8A5HDoi9Ea2YcqxwIn7jz1VryROFNoRNBT5+7ldw3GAHXl4uGoji9rfUlXSHKLsxA4ZG3lum6jVgMz9Wpe0uIYLbpa0g8V0Yzr9TkjVWR9IQZ u111358@rosenberg
^D
Added SHA256:Vf49YizTm3zjDtLM3bQ7haK0zsoausiJ8xG+9/LPTPE : konstantinos@rosenberg.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys list
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
Hello nguenther, you are an admin.

These are all registered keys:
============================
1: SHA256:BZcsg/BfyQ27pIOSFw94ZiBmTKGHJ7Qy/Vqww/x5ujQ : alfoi.pub
2: SHA256:AZp8tEp8yJKivYB91wPWqRyVIQm3SzlJYk7PlPv26o8 : andreannelemay.pub
3: SHA256:Ss3ePRjzwzjZAUYmqItooySyJdtd2UvlqbDZ5UaIAHo : jcohen.pub
4: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos@acheron.pub
5: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos.pub
6: SHA256:Vf49YizTm3zjDtLM3bQ7haK0zsoausiJ8xG+9/LPTPE : konstantinos@rosenberg.pub
7: SHA256:EBfMaqmOuoXeNU7BGuDm2S07tgZgdkuEBMAQlmV3fAI : nguenther.pub
8: SHA256:6w9uivbXYfjnDEz3NukOB3L9IZFdHj8qZn0BXiSTl4o : nguenther@requiem.pub
9: SHA256:YjIcdy0fnALMCfT8YEx7x6eexXyugvvfuHKRCRT48vA : nguenther@server.pub

Hm, a weird inconsistency: I accidentally renamed konstantinos@archeron.pub -> konstantinos.pub konstantinos@rosenberg.pub and konstantinos.pub -> konstantinos@rosenberg.pub, and now gitolite shows all three names. I'm surprised? I would think

nguenther@data:~/gitolite-admin$ git pull --rebase
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (12/12), done.
Unpacking objects: 100% (12/12), 1.91 KiB | 1.92 MiB/s, done.
remote: Total 12 (delta 4), reused 0 (delta 0), pack-reused 0
From data.neuro.polymtl.ca:gitolite-admin
   3b3ad80..eac44aa  master     -> origin/master
Updating 3b3ad80..eac44aa
Fast-forward
 keydir/konstantinos.pub           | 2 ++
 keydir/konstantinos@rosenberg.pub | 1 +
 2 files changed, 3 insertions(+)
 create mode 100644 keydir/konstantinos.pub
 create mode 100644 keydir/konstantinos@rosenberg.pub
nguenther@data:~/gitolite-admin$ ls keydir/
alfoi.pub  andreannelemay.pub  jcohen.pub  konstantinos.pub  konstantinos@rosenberg.pub  nguenther.pub  nguenther@requiem.pub  nguenther@server.pub
nguenther@data:~/gitolite-admin$ 

They're not in the admin repo. The wrong ones must be literally...on the disk. Ugh. That's the first big strike I've had against gitolite.

Patch this over this very silly way:

nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys list
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
Hello nguenther, you are an admin.

These are all registered keys:
============================
1: SHA256:BZcsg/BfyQ27pIOSFw94ZiBmTKGHJ7Qy/Vqww/x5ujQ : alfoi.pub
2: SHA256:AZp8tEp8yJKivYB91wPWqRyVIQm3SzlJYk7PlPv26o8 : andreannelemay.pub
3: SHA256:Ss3ePRjzwzjZAUYmqItooySyJdtd2UvlqbDZ5UaIAHo : jcohen.pub
4: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos@acheron.pub
5: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos.pub
6: SHA256:Vf49YizTm3zjDtLM3bQ7haK0zsoausiJ8xG+9/LPTPE : konstantinos@rosenberg.pub
7: SHA256:EBfMaqmOuoXeNU7BGuDm2S07tgZgdkuEBMAQlmV3fAI : nguenther.pub
8: SHA256:6w9uivbXYfjnDEz3NukOB3L9IZFdHj8qZn0BXiSTl4o : nguenther@requiem.pub
9: SHA256:YjIcdy0fnALMCfT8YEx7x6eexXyugvvfuHKRCRT48vA : nguenther@server.pub

nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys add konstantinos@acheron
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDUyn5vweHYvJnQcwu79yiRHwS+ZY2HcD1HShP4xJ1gKHMXlhlVbZ6gx2yYyRV6eOdKOIplyNPw5zOjd8pXYsjMLtZru2brLDNoynzwFqJY8VqfZRVhHKQnnU056dtT16Qp2u+DfeOvJhANYiSlnrMV0W+/nup4PoiWarseOPNySdeBo80k/oWJLp8kn9kXTemIa3ZOtNLWFWN4kxyVIA5F5l7rIzmpaBRjx8TuibP9afQKFLDw3vfNBEFzc0/oCYE6GWApvoxwfnP4AIHjL5WZ8TDy9I5RrlNCxhBxRVau4WXhOvAj58IiB/9I1Hi2g178qc9dTBYx0GM1Cbg7RWQsEdua6qabdE2L2wG3oPoQmfcQqtrRsFW5nfOOZ3U8hSk9YX/hlpa2y68EyC/+x2Yt9irDG6mGgfyIY3T8dhGerMgZ9BOOpVwuzVZiLrpZnPJc8kljdaiwS4Olqo9jh5FO/k7U9is56ODFyTGuQqW1H8O2BkIJtAva5E6xOTtWmwE=
Added SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos@acheron.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys del konstantinos
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
Removed SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos.pub
nguenther@data:~/gitolite-admin$ ssh git@data.neuro.polymtl.ca keys list
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
Hello nguenther, you are an admin.

These are all registered keys:
============================
1: SHA256:BZcsg/BfyQ27pIOSFw94ZiBmTKGHJ7Qy/Vqww/x5ujQ : alfoi.pub
2: SHA256:AZp8tEp8yJKivYB91wPWqRyVIQm3SzlJYk7PlPv26o8 : andreannelemay.pub
3: SHA256:Ss3ePRjzwzjZAUYmqItooySyJdtd2UvlqbDZ5UaIAHo : jcohen.pub
4: SHA256:ya5itEhcKIdveMBnqFjRpDhxRjeicw4VDsP8NKWiRls : konstantinos@acheron.pub
5: SHA256:Vf49YizTm3zjDtLM3bQ7haK0zsoausiJ8xG+9/LPTPE : konstantinos@rosenberg.pub
6: SHA256:EBfMaqmOuoXeNU7BGuDm2S07tgZgdkuEBMAQlmV3fAI : nguenther.pub
7: SHA256:6w9uivbXYfjnDEz3NukOB3L9IZFdHj8qZn0BXiSTl4o : nguenther@requiem.pub
8: SHA256:YjIcdy0fnALMCfT8YEx7x6eexXyugvvfuHKRCRT48vA : nguenther@server.pub

Recovering Datasets

nguenther@data:~$ mv datasets/large/ datasets/sct-testing-large # catch up with the repo rename we did
nguenther@data:~$ cd datasets/sct-testing-large/
nguenther@data:~/datasets/sct-testing-large$ git remote -v  # the remotes are already set correctly, though
origin  git@data.neuro.polymtl.ca:datasets/sct-testing-large.git (fetch)
origin  git@data.neuro.polymtl.ca:datasets/sct-testing-large.git (push)

Upload the git part:

nguenther@data:~/datasets/sct-testing-large$ git push --all origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
Initialized empty Git repository in /srv/git/repositories/datasets/sct-testing-large.git/
Enumerating objects: 84314, done.
Counting objects: 100% (84314/84314), done.
Compressing objects: 100% (62957/62957), done.
Writing objects: 100% (84314/84314), 7.30 MiB | 19.17 MiB/s, done.
Total 84314 (delta 25819), reused 74365 (delta 15870), pack-reused 0
remote: Resolving deltas: 100% (25819/25819), done.
To data.neuro.polymtl.ca:datasets/sct-testing-large.git
 * [new branch]          git-annex -> git-annex
 * [new branch]          master -> master
 * [new branch]          synced/master -> synced/master

great!

But, new problem when trying to upload the annex part:

nguenther@data:~/datasets/sct-testing-large$ git annex copy --to origin
[...]
copy derivatives/labels/sub-amuAMU15005/anat/sub-amuAMU15005_T2star_gmseg-manual.nii.gz (checking origin...) git-annex-shell: expected repository UUID de2707ce-a9b6-4815-9f3d-edff5c166624 but found uninitialized repository
(to origin...) 
git-annex-shell: expected repository UUID de2707ce-a9b6-4815-9f3d-edff5c166624 but found uninitialized repository
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(228) [sender=3.2.3]
rsync exited 12

  rsync failed -- run git annex again to resume file transfer
failed
copy derivatives/labels/sub-amuAMU15005/anat/sub-amuAMU15005_T2star_seg-manual.nii.gz (checking origin...) git-annex-shell: expected repository UUID de2707ce-a9b6-4815-9f3d-edff5c166624 but found uninitialized repository
[...]

git-annex does this location tracking which ..would be more helpful if it wasn't so tightly integrated, I think. It is expecting to find a repo that isn't there anymore and it's balking. But I uploaded to an empty repo before, so what's the difference?

I looked around and thought and basically just lucked into realizing it was probably in .git/config, and sure enough:

nguenther@data:~/datasets/sct-testing-large$ git config --unset remote.origin.annex-uuid
nguenther@data:~/datasets/sct-testing-large$ time git annex copy --to origin
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
copy derivatives/labels/sub-amuAMU15001/anat/sub-amuAMU15001_T2star_gmseg-manual.nii.gz (to origin...) 
ok                                
copy derivatives/labels/sub-amuAMU15001/anat/sub-amuAMU15001_T2star_seg-manual.nii.gz (to origin...) 
ok                                
copy derivatives/labels/sub-amuAMU15002/anat/sub-amuAMU15002_T2star_gmseg-manual.nii.gz (to origin...) 
ok                                
copy derivatives/labels/sub-amuAMU15002/anat/sub-amuAMU15002_T2star_seg-manual.nii.gz (to origin...) 
ok                                
copy derivatives/labels/sub-amuAMU15003/anat/sub-amuAMU15003_T2star_gmseg-manual.nii.gz (to origin...)
[...] 
copy sub-zurichMPM05_ses-02/anat/sub-zurichMPM05_ses-02_echo-3_T1w.nii.gz (to origin...) 
ok                                
copy sub-zurichMPM05_ses-02/anat/sub-zurichMPM05_ses-02_echo-4_T1w.nii.gz (to origin...) 
ok                                
copy sub-zurichMPM05_ses-02/anat/sub-zurichMPM05_ses-02_echo-5_T1w.nii.gz (to origin...) 
ok                                
copy sub-zurichMPM05_ses-02/anat/sub-zurichMPM05_ses-02_echo-6_T1w.nii.gz (to origin...) 
ok                                
(recording state in git...)

real    11m23.040s
user    0m37.712s
sys 0m37.427s

And now that I've done this I should make sure to dead the missing repo; this UUID is what git config remote.origin.annex-uuid was before:

nguenther@data:~/datasets/sct-testing-large$ git annex dead de2707ce-a9b6-4815-9f3d-edff5c166624
dead de2707ce-a9b6-4815-9f3d-edff5c166624 ok
(recording state in git...)
nguenther@data:~/datasets/sct-testing-large$ git annex sync --content origin
commit 
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
ok
pull origin 
Enter passphrase for key '/home/nguenther/.ssh/id_ed25519': 
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
ok
push origin 
Enumerating objects: 32584, done.
Counting objects: 100% (32584/32584), done.
Compressing objects: 100% (21197/21197), done.
Writing objects: 100% (22643/22643), 2.05 MiB | 11.75 MiB/s, done.
Total 22643 (delta 9546), reused 13613 (delta 516), pack-reused 0
remote: Resolving deltas: 100% (9546/9546), completed with 9030 local objects.
To data.neuro.polymtl.ca:datasets/sct-testing-large.git
 * [new branch]          git-annex -> synced/git-annex
ok

Spot check that git-annex thinks there's now only one copy of each thing:

nguenther@data:~/datasets/sct-testing-large$ git annex whereis derivatives/labels/sub-bwh028/
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-ax_T2w_lesion-manual.nii.gz (1 copy) 
    6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-ax_T2w_seg-manual.nii.gz (1 copy) 
    6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-sag_T2w_labels-disc-manual.nii.gz (1 copy) 
    6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-sagstir_T2w_labels-disc-manual.nii.gz (1 copy) 
    6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-sagstir_T2w_lesion-manual.nii.gz (1 copy) 
    6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok
whereis derivatives/labels/sub-bwh028/anat/sub-bwh028_acq-sagstir_T2w_seg-manual.nii.gz (1 copy) 
    6c8420e2-ee60-4383-96ba-cb43ef3c5611 -- origin
ok

Here's the upload for the other dataset:

[kousu@requiem data-single-subject]$ git config --unset remote.internal.annex-uuid
[kousu@requiem data-single-subject]$ git push --all internal
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github': 
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github': 
Initialized empty Git repository in /srv/git/repositories/datasets/data-single-subject.git/
Enumerating objects: 3169, done.
Counting objects: 100% (3169/3169), done.
Delta compression using up to 4 threads
Compressing objects: 100% (1593/1593), done.
Writing objects: 100% (3169/3169), 286.13 KiB | 14.31 MiB/s, done.
Total 3169 (delta 1570), reused 1979 (delta 829), pack-reused 0
remote: Resolving deltas: 100% (1570/1570), done.
To 132.207.65.204:datasets/data-single-subject.git
 * [new branch]      git-annex -> git-annex
 * [new branch]      master -> master
 * [new branch]      synced/master -> synced/master
[kousu@requiem data-single-subject]$ git config annex.sshcaching true # necessary to avoid
[kousu@requiem data-single-subject]$ time git annex copy --to internal
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github': I_r_labels-manual.nii.gz 
(to internal...) 
ok                                
copy derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (to internal...) 
ok                                
copy derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (to internal...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_T1w.nii.gz (to internal...) 
[...]
copy sub-unf/anat/sub-unf_T1w.nii.gz (to internal...) 
ok                                 
copy sub-unf/anat/sub-unf_T2star.nii.gz (to internal...) 
ok                                
copy sub-unf/anat/sub-unf_T2w.nii.gz (to internal...) 
ok                                 
copy sub-unf/anat/sub-unf_acq-MToff_MTS.nii.gz (to internal...) 
ok                                
copy sub-unf/anat/sub-unf_acq-MTon_MTS.nii.gz (to internal...) 
ok                                
copy sub-unf/anat/sub-unf_acq-T1w_MTS.nii.gz (to internal...) 
ok                                
copy sub-unf/dwi/sub-unf_dwi.nii.gz (to internal...) 
ok                                
(recording state in git...)

real    12m34.268s
user    0m6.017s
sys 0m4.351s

And to dead the repo:

[kousu@requiem data-single-subject]$ git annex whereis sub-unf/dwi/
whereis sub-unf/dwi/sub-unf_dwi.nii.gz (5 copies) 
    5ca3a9a5-ac75-410e-8dcd-8a24463f08fa -- julien@julien-macbook.local:~/code/spine-generic/data-single-subject
    74ec6586-6ac2-4700-892e-56f55ac5544b
    8aea80c3-2550-4340-8d36-42af5475c103 -- internal
    c99162a2-3e7d-4100-82e7-1e077a0793f6 -- [amazon]
    e80b53d8-6bf7-4996-a918-4c284c440217 -- kousu@requiem:~/src/neuropoly/datalad/data-single-subject [here]

  amazon: https://data-single-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s2744719--1427176f437e7980602d3e8b355750df823f971e613acc3abcf1bc32b4944430.nii.gz
ok

Ah, there's actually my laptop and @jcohenadad's laptop listed here, in addition to the new repo (8aea80c3-2550-4340-8d36-42af5475c103) and the old one (74ec6586-6ac2-4700-892e-56f55ac5544b). Kill all of them:

[kousu@requiem data-single-subject]$ git annex dead here
dead here (recording state in git...)
ok
(recording state in git...)
[kousu@requiem data-single-subject]$ git annex dead 5ca3a9a5-ac75-410e-8dcd-8a24463f08fa
dead 5ca3a9a5-ac75-410e-8dcd-8a24463f08fa ok
(recording state in git...)

But, funny, when I try

[kousu@requiem data-single-subject]$ git annex dead 74ec6586-6ac2-4700-892e-56f55ac5544b
git-annex: there is no available git remote named "74ec6586-6ac2-4700-892e-56f55ac5544b"

Okay here's a very silly workaround:

[kousu@requiem data-single-subject]$ NEW=$(git config remote.internal.annex-uuid); echo $NEW
8aea80c3-2550-4340-8d36-42af5475c103
[kousu@requiem data-single-subject]$ git config remote.internal.annex-uuid 74ec6586-6ac2-4700-892e-56f55ac5544b
[kousu@requiem data-single-subject]$ git annex whereis sub-unf/dwi/
whereis sub-unf/dwi/sub-unf_dwi.nii.gz (3 copies) 
    74ec6586-6ac2-4700-892e-56f55ac5544b -- internal
    8aea80c3-2550-4340-8d36-42af5475c103
    c99162a2-3e7d-4100-82e7-1e077a0793f6 -- [amazon]

  amazon: https://data-single-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s2744719--1427176f437e7980602d3e8b355750df823f971e613acc3abcf1bc32b4944430.nii.gz
ok
[kousu@requiem data-single-subject]$ git annex dead 74ec6586-6ac2-4700-892e-56f55ac5544b
dead 74ec6586-6ac2-4700-892e-56f55ac5544b ok
(recording state in git...)
[kousu@requiem data-single-subject]$ git config remote.internal.annex-uuid "$NEW"
[kousu@requiem data-single-subject]$ git annex whereis sub-unf/dwi/
whereis sub-unf/dwi/sub-unf_dwi.nii.gz (2 copies) 
    8aea80c3-2550-4340-8d36-42af5475c103 -- internal
    c99162a2-3e7d-4100-82e7-1e077a0793f6 -- [amazon]

  amazon: https://data-single-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com/SHA256E-s2744719--1427176f437e7980602d3e8b355750df823f971e613acc3abcf1bc32b4944430.nii.gz
ok

And a final sync up to catch all the metadata branches:

[kousu@requiem data-single-subject]$ git annex sync --content internal
commit 
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
ok
pull internal 
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github': 
ok
push internal 
Enumerating objects: 818, done.
Counting objects: 100% (818/818), done.
Delta compression using up to 4 threads
Compressing objects: 100% (300/300), done.
Writing objects: 100% (442/442), 32.26 KiB | 971.00 KiB/s, done.
Total 442 (delta 292), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (292/292), completed with 243 local objects.
To 132.207.65.204:datasets/data-single-subject.git
 * [new branch]      git-annex -> synced/git-annex
ok

Check that everything looks good:

[kousu@requiem data-single-subject]$ ssh git@data.neuro.polymtl.ca
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github': 
PTY allocation request failed on channel 0
hello nguenther, this is git@data running gitolite3 3.6.11-2 (Debian) on git 2.27.0

 R W C  CREATOR/..*
 R W C  datasets/..*
 R W    datasets/data-single-subject
 R W    datasets/sct-testing-large
 R W    gitolite-admin
Connection to data.neuro.polymtl.ca closed.

And check that on the server side, the sizes are what I remember:

nguenther@data:~/datasets/sct-testing-large$ sudo -i -u git bash
[sudo] password for nguenther: 
git@data:~$ cd repositories
git@data:~/repositories$ du -hs datasets/*
884M    datasets/data-single-subject.git
19G datasets/sct-testing-large.git
git@data:~/repositories$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc1      1007G   20G  937G   2% /srv/git/repositories
kousu commented 3 years ago

See #22.

kousu commented 3 years ago

2020-??-?? Ensuring Access Part II (Console)

Jean-Sébastien Décarie has said he can give us access to the server's console; HyperV console are run over RDP (unlike QEMU's which are run over VNC); supposedly this command line will let a Linux user connect:

xfreerdp –ignore-certificate –no-nego -t 2179 -u $username –pcb $vmid $hypervhost

However, JS said he is not ready to grant us the rights to this: the HyperV network is not segregated enough. He thinks it is possible but it will take him some time to redesign, so we shouldn't hold our breath.

This was Polytechique Ticket/8478, but it's closed for now.

kousu commented 3 years ago

The only noticeable fallout from this seems to be, like with a bad force-push, everyone who had copies of the repos needs to acknowledge the glitch. With this, that means:

git config --unset remote.origin.annex-uuid

for each repo they had checked out. This was just me, Alex, Julien and Konstantinos, and I've walked all of them through fixing it.

kousu commented 3 years ago

I'll reopen if I find more glitches but I think this is done.

kousu commented 3 years ago

To better handle this in the future, we now have https://monitor.neuro.polymtl.ca (https://github.com/neuropoly/computers/issues/4).