neuropoly / data-management

Repo that deals with datalad aspects for internal use
4 stars 0 forks source link

Internal server (git+ssh://data.neuro.polymtl.ca) #22

Closed kousu closed 3 years ago

kousu commented 3 years ago

I am deploying a git server on poly's internal infrastructure. This is cheaper, and faster (bandwidth-wise) than paying Amazon or Github for hosting the large datasets we experiment on, and safer for our datasets with privileged medical data.

kousu commented 3 years ago

The first debate was what software to use: Gitlab, Gitea, or Gitolite.

I went with Gitolite because it has no web interface. Web interfaces are handy for users, but a nightmare to maintain, especially since this internal server isn't going to have a web presence so getting a cert for it from letsencrypt is extremely tricky. Since most of our interaction with the server will be over ssh anyway, especially since it's going to need git-annex in the mix, I think this is okay, and then we sidestep the need for certs because ssh has its own TOFU-based encryption.

Also, Gitlab removed git-annex support, which is fine, I know they have their reasons, but that makes it incompatible with datalad so that's right out for us.

Here's a pretty good endorsement from a lab much like ours: https://caesr.uwaterloo.ca/wildrepos-in-gitolite/

kousu commented 3 years ago

The first step was to name the server:

echo donnees.neuro.polymtl.ca > /etc/hostname

And to ask IT at Polytechnique to set up DNS records (A and PTRs) to match.

kousu commented 3 years ago

Next was to get git, gitolite and git-annex installed. git-annex needs a server component; it's a smaller server component than git-lfs needs, but it's still there and since this is Ubuntu 18.04 its git-annex is way too old, we need to install git-annex from elsewhere -- I picked neurodebian -- instead.

These instructions were from http://neuro.debian.net/install_pkg.html?p=git-annex-standalone:

root@donnees:/home/nguenther# wget -O- http://neuro.debian.net/lists/bionic.us-ca.full | sudo tee /etc/apt/sources.list.d/neurodebian.sources.list
--2020-09-22 14:15:35--  http://neuro.debian.net/lists/bionic.us-ca.full
Resolving neuro.debian.net (neuro.debian.net)... 129.170.233.11
Connecting to neuro.debian.net (neuro.debian.net)|129.170.233.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 262
Saving to: ‘STDOUT’

deb http://neurodeb.pirsquared.org data main contrib non-free                                                      ]       0  --.-KB/s               
#deb-src http://neurodeb.pirsquared.org data main contrib non-free
deb http://neurodeb.pirsquared.org bionic main contrib non-free
#deb-src http://neurodeb.pirsquared.org bionic main contrib non-free
-                                     100%[=======================================================================>]     262  --.-KB/s    in 0s      

2020-09-22 14:15:35 (32.9 MB/s) - written to stdout [262/262]

root@donnees:/home/nguenther# sudo apt-key adv --recv-keys --keyserver hkp://pool.sks-keyservers.net:80 0xA5D32F012649A5A9
Executing: /tmp/apt-key-gpghome.m1EYN96iWP/gpg.1.sh --recv-keys --keyserver hkp://pool.sks-keyservers.net:80 0xA5D32F012649A5A9
gpg: key A5D32F012649A5A9: 7 signatures not checked due to missing keys
gpg: key A5D32F012649A5A9: "NeuroDebian Archive Key <pkg-exppsy-maintainers@lists.alioth.debian.org>" 1 new user ID
gpg: key A5D32F012649A5A9: "NeuroDebian Archive Key <pkg-exppsy-maintainers@lists.alioth.debian.org>" 8 new signatures
gpg: Total number processed: 1
gpg:           new user IDs: 1
gpg:         new signatures: 8

Now we can install the three parts:

root@donnees:/home/nguenther# DEBIAN_FRONTEND=noninteractive apt-get install -y git git-annex-standalone gitolite3
Reading package lists... Done
Building dependency tree       
Reading state information... Done
git is already the newest version (1:2.24.0-1~nd18.04+1).
Suggested packages:
  xdot bup adb tor magic-wormhole tahoe-lafs uftp git-daemon-sysvinit gitweb
The following NEW packages will be installed:
  git-annex-standalone git-remote-gcrypt gitolite3
0 upgraded, 3 newly installed, 0 to remove and 15 not upgraded.
Need to get 64.1 MB/64.3 MB of archives.
After this operation, 189 MB of additional disk space will be used.
Get:1 http://neurodeb.pirsquared.org bionic/main amd64 git-annex-standalone amd64 7.20190819+git2-g908476a9b-1~ndall+1 [64.1 MB]
Fetched 64.1 MB in 2s (29.8 MB/s)                
Preconfiguring packages ...
Selecting previously unselected package git-annex-standalone.
(Reading database ... 176026 files and directories currently installed.)
Preparing to unpack .../git-annex-standalone_7.20190819+git2-g908476a9b-1~ndall+1_amd64.deb ...
Unpacking git-annex-standalone (7.20190819+git2-g908476a9b-1~ndall+1) ...
Selecting previously unselected package git-remote-gcrypt.
Preparing to unpack .../git-remote-gcrypt_1.0.2-1_all.deb ...
Unpacking git-remote-gcrypt (1.0.2-1) ...
Selecting previously unselected package gitolite3.
Preparing to unpack .../gitolite3_3.6.7-2_all.deb ...
Unpacking gitolite3 (3.6.7-2) ...
Setting up git-remote-gcrypt (1.0.2-1) ...
Setting up git-annex-standalone (7.20190819+git2-g908476a9b-1~ndall+1) ...
Setting up gitolite3 (3.6.7-2) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Processing triggers for mime-support (3.60ubuntu1) ...
Processing triggers for desktop-file-utils (0.23-1ubuntu3.18.04.2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for gnome-menus (3.13.3-11ubuntu1.1) ...
kousu commented 3 years ago

We're going to set up gitolite to be primarily based around "wildrepos", like https://caesr.uwaterloo.ca/wildrepos-in-gitolite/, or in other words: to operate as if it was Github. Gitolite's initial design assumed that server admins would be closely involved with what went on on their server; but that'll be too onerous for us, so instead I'm going to set it up so every repo is managed by its users. See below for the specifics.

  1. Install gitolite-mods/keys:

While I was looking at gitolite, I liked it, except I didn't like that it wanted you to clone and edit gitolite-admin just to manage your users, especially when it has CLIs for everything else. It has sskm and ukm that offer this, but one is unmaintained and has a twisty UI, and the other has a twisty UI and is only meant for use by admins; I think both are mistake-prone. So I merged both of them into keys which has a more straightforward UI. And yes I know I just did this.

root@donnees:/home/nguenther# curl -JL https://raw.githubusercontent.com/kousu/gitolite-mods/master/keys -o /usr/share/gitolite3/commands/keys
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10999  100 10999    0     0  65082      0 --:--:-- --:--:-- --:--:-- 64700
root@donnees:/home/nguenther# ls -l /usr/share/gitolite3/commands/keys
-rw-r--r-- 1 root root 10999 Sep 22 14:20 /usr/share/gitolite3/commands/keys
root@donnees:/home/nguenther# chmod +x /usr/share/gitolite3/commands/keys
  1. Set up the git user

gitolite needs a single system user within which to place all its; this is the same way github works, by the way: whenever you clone a repo over ssh you're always going git clone git@github.com:.....

Make sure it's out of the way first:

root@donnees:/home/nguenther# id git
id: ‘git’: no such user
root@donnees:/home/nguenther# useradd -d /srv/git -m git
root@donnees:/home/nguenther# ls /srv/git/
root@donnees:/home/nguenther# ls -ld /srv/git/
drwxr-xr-x 2 git git 4096 Sep 22 14:23 /srv/git/
  1. Bootstrap an admin account:
kousu@ail:~$ scp ~/.ssh/id_ed25519.neuropoly.pub 132.207.65.204:/tmp/nguenther.pub
id_ed25519.neuropoly.pub 
  1. Bootstrap gitolite (with the given admin account):
root@donnees:/tmp# sudo -u git -i gitolite setup -pk /tmp/nguenther.pub 
Initialized empty Git repository in /srv/git/repositories/gitolite-admin.git/
Initialized empty Git repository in /srv/git/repositories/testing.git/
WARNING: /srv/git/.ssh missing; creating a new one
    (this is normal on a brand new install)
WARNING: /srv/git/.ssh/authorized_keys missing; creating a new one
    (this is normal on a brand new install)
  1. Enable some core gitolite commands.

We need: create, readme, keys, and D, git-annex; also turn on "no-create-on-read" b:

In

root@donnees:/tmp# vi ~git/.gitolite.rc 

edit like so:

...
    # List of commands and features to enable

    ENABLE => [

        # COMMANDS

            # These are the commands enabled by default
            'help',
            'desc',
            'info',
            'perms',
            'writable',

            # Uncomment or add new commands here.
            'create',
            # 'fork',
            # 'mirror',
            'readme',
            'keys',
            # 'sskm',
            'D',

            'git-annex-shell ua',

            'no-create-on-read',
...

"no-create-on-read" prevents giving empty repo when someone git clones it; instead, it errors, and they need to create it locally first and then git push; I don't know why it's not on by default.

  1. (on your workstation) verify the bootstrapping worked:
kousu@ail:~$ ssh git@132.207.65.204 info
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
hello nguenther, this is git@donnees running gitolite3 3.6.7-2 (Debian) on git 2.24.0

 R W    gitolite-admin
 R W    testing
  1. Configure the repos namespaces:
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ git clone git@132.207.65.204:gitolite-admin
Cloning into 'gitolite-admin'...
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (6/6), done.
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cd gitolite-admin/

Edit the config:

kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ vi conf/gitolite.conf 

To read:

@admin = nguenther

# who can control this meta-repo.
repo gitolite-admin
    RW+ = @admin

# wildcard repositories for the users
# users can grant R/W permissions with `ssh git@server perms`
repo CREATOR/..*
    C    = @all
    RW+D = @admin CREATOR
    RW   = WRITERS
    R    = READERS

# semi-public datasets
repo datasets/..*
    C    = @all
    RW+D = @admin CREATOR
    RW  = WRITERS
    R   = @all

Upload:

kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git add -u
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git commit -m "Configure permissions"
[master afb3383] Configure permissions
 1 file changed, 20 insertions(+), 5 deletions(-)
 rewrite conf/gitolite.conf (74%)
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git push
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 528 bytes | 132.00 KiB/s, done.
Total 4 (delta 0), reused 0 (delta 0)
To 132.207.65.204:gitolite-admin
   dccfb34..afb3383  master -> master
  1. Verify you still have admin access and that the right commands work:
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ ssh git@132.207.65.204 info
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
hello nguenther, this is git@donnees running gitolite3 3.6.7-2 (Debian) on git 2.24.0

 R W C  CREATOR/..*
 R W C  datasets/..*
 R W    gitolite-admin
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ ssh git@132.207.65.204 help
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
hello nguenther, this is git@donnees running gitolite3 3.6.7-2 (Debian) on git 2.24.0

list of remote commands available:

    D
    create
    desc
    help
    info
    keys
    perms
    readme
    writable

Make sure my keys command seems to be behaving:

kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ ssh git@132.207.65.204 keys
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Hello nguenther, you are an admin.

These are all registered keys:
============================
1: SHA256:EBfMaqmOuoXeNU7BGuDm2S07tgZgdkuEBMAQlmV3fAI : nguenther.pub
  1. Add some users, using keys
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cat jcohen@polymtl.ca.pub | ssh git@132.207.65.204 keys add jcohen
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
Added SHA256:Ss3ePRjzwzjZAUYmqItooySyJdtd2UvlqbDZ5UaIAHo : jcohen.pub

kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cat andreanne.lemay@polymtl.ca.pub | ssh git@132.207.65.204 keys add andreannelemay
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
Added SHA256:AZp8tEp8yJKivYB91wPWqRyVIQm3SzlJYk7PlPv26o8 : andreannelemay.pub

kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cat alexandru.foias@polymtl.ca.pub | ssh git@132.207.65.204 keys add alfoi
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
Added SHA256:BZcsg/BfyQ27pIOSFw94ZiBmTKGHJ7Qy/Vqww/x5ujQ : alexfoi.pub
  1. Grant admin to @jcohenadad and @alexfoias

    -- even with keys there's no way to do this from the CLI, so you have to go into the gitolite-admin repo again and edit the @admins list

kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cd gitolite-admin/
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ vi conf/gitolite.conf 
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git diff
diff --git a/conf/gitolite.conf b/conf/gitolite.conf
index 0366efe..0ad7680 100644
--- a/conf/gitolite.conf
+++ b/conf/gitolite.conf
@@ -1,4 +1,4 @@
-@admin = nguenther
+@admin = nguenther jcohen alexfoi

 # who can control this meta-repo.
 repo gitolite-admin
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git add -u
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git commit -m "Nominate Julien and Alex as server admins"
[master 6727d16] Nominate Julien and Alex as server admins
 1 file changed, 1 insertion(+), 1 deletion(-)
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git push
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
To 132.207.65.204:gitolite-admin
 ! [rejected]        master -> master (fetch first)
error: failed to push some refs to 'git@132.207.65.204:gitolite-admin'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git pull --rebase # necessary because `keys` edited this repo remotely!
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 12 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (12/12), done.
From 132.207.65.204:gitolite-admin
   afb3383..9e64c62  master     -> origin/master
First, rewinding head to replay your work on top of it...
Applying: Nominate Julien and Alex as server admins
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git push
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 401 bytes | 401.00 KiB/s, done.
Total 4 (delta 1), reused 0 (delta 0)
To 132.207.65.204:gitolite-admin
   9e64c62..5def6d5  master -> master
kousu commented 3 years ago

Upload some data just to see.

I didn't keep logs of this part, I'm sorry. I just know what I would have done.

On my laptop:

git clone git@github.com:spine-generic/data-single-subject.git
cd data-single-subject/
git annex init
git annex dead here # disable git-annex's whereis tracking on this local copy. we don't want that cruft uploaded.
git annex get .
git remote add internal git@data.neuro.polymtl.ca:datasets/data-single-subject.git
git push internal --all
git annex copy --to internal

Log in to the server and check that it's there and the annex has been filled in:

du -hs ~git/repositories/datasets/data-single-subject.git/annex
kousu commented 3 years ago

Round II: with a bigger disk

A 1To disk was provisioned onto the server by the hypervisor (thanks Jean-Sébastien)

nguenther@donnees:~$ sudo fdisk /dev/sdb
[sudo] password for nguenther: 

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): p
Disk /dev/sdb: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 32768 bytes
Disklabel type: gpt
Disk identifier: 8804AEEA-1B56-4E85-9F98-05D32D5767C6

Device     Start        End    Sectors  Size Type
/dev/sdb1     34      32767      32734   16M Microsoft reserved
/dev/sdb2  32768 2147479551 2147446784 1024G Microsoft basic data

Partition 1 does not start on physical sector boundary.

Command (m for help): m

Help:

  Generic
   d   delete a partition
   F   list free unpartitioned space
   l   list known partition types
   n   add a new partition
   p   print the partition table
   t   change a partition type
   v   verify the partition table
   i   print information about a partition

  Misc
   m   print this menu
   x   extra functionality (experts only)

  Script
   I   load disk layout from sfdisk script file
   O   dump disk layout to sfdisk script file

  Save & Exit
   w   write table to disk and exit
   q   quit without saving changes

  Create a new label
   g   create a new empty GPT partition table
   G   create a new empty SGI (IRIX) partition table
   o   create a new empty DOS partition table
   s   create a new empty Sun partition table

Command (m for help): g

Created a new GPT disklabel (GUID: F39B8299-4E4E-8C4B-96BF-758F07539380).

Command (m for help): n
Partition number (1-128, default 1): 
First sector (2048-2147483614, default 2048): 
Last sector, +sectors or +size{K,M,G,T,P} (2048-2147483614, default 2147483614): 

Created a new partition 1 of type 'Linux filesystem' and of size 1024 GiB.

Command (m for help): p
Disk /dev/sdb: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 32768 bytes
Disklabel type: gpt
Disk identifier: F39B8299-4E4E-8C4B-96BF-758F07539380

Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 2147483614 2147481567 1024G Linux filesystem

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Make the partition:

nguenther@donnees:~$ sudo mkfs.ext4 -L "neuropoly-data" /dev/sdb1 

Put it in fstab:

root@donnees# cat >>/etc/fstab <<EOF 
# datasets
UUID=efcb5b7c-0aad-44fd-b949-cc98595e0296 /srv/git/repositories ext4 errors=remount-ro 0 1
EOF

Swap data locations:

git@donnees$ mv repositories repositories.bak
git@donnees$ mkdir repositories
root@donnees# mount /srv/git/repositories
root@donnees# chown -R git:git /srv/git/repositories
git@donnees$ rsync -av repositories.bak/ repositories/

--

There was a nuisance here: because repositories/ is now the root of a filesystem, it contains a "lost+found/" folder after fscking, and that shows up in gitolite, eg

kousu@ail:~/src/neuropoly/datalad$ ssh git@132.207.65.204 info
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
PTY allocation request failed
hello nguenther, this is git@donnees running gitolite3 3.6.7-2 (Debian) on git 2.24.0

 R W C  CREATOR/..*
 R W C  datasets/..*
find: ‘./lost+found’: Permission denied
 R W    datasets/data-single-subject
 R W    gitolite-admin
Shared connection to 132.207.65.204 closed.

Can I make gitolite ignore certain folders? I read the code; it doesn't look like it:

https://github.com/sitaramc/gitolite/blob/f073598c5dca815a5553cebaee3d2a3f7abcbaec/src/lib/Gitolite/Common.pm#L230-L236

so what can I do?

I went with just deleting it and hoping it doesn't come back.

kousu commented 3 years ago

Some Sysadmin Things

Getting mail working on the system:

apt-get install mailutils opensmtpd
echo "buttercups" | mail -s "Testing from data.neuro.polymtl.ca 123" nick@kousu.ca

This mailer is always going to be a bit sketchy since I don't control the DNS so I can't auth it fully; maybe I can set up a relay account for it somewhere?

Anyway this seems to be good enough; @polymtl.ca accepts mails from it, and that should be enough for now.

Automatic Upgrades

(ref: https://wiki.debian.org/UnattendedUpgrades)

dpkg-reconfigure -plow unattended-upgrades

In

vi /etc/apt/apt.conf.d/50unattended-upgrades

Apply these lines:

  1. enable the -updates repo; this is where the bulk of the updates come in Ubuntu; without this, you will only get the critical security updates, which is too conservative

    // Automatically upgrade packages from these (origin:archive) pairs
    //
    // Note that in Ubuntu security updates may pull in new dependencies
    // from non-security sources (e.g. chromium). By allowing the release
    // pocket these get automatically pulled in.
    Unattended-Upgrade::Allowed-Origins {
        "${distro_id}:${distro_codename}";
        "${distro_id}:${distro_codename}-security";
        // Extended Security Maintenance; doesn't necessarily exist for
        // every release and this system may not have it installed, but if
        // available, the policy for updates is such that unattended-upgrades
        // should also install from here by default.
        "${distro_id}ESMApps:${distro_codename}-apps-security";
        "${distro_id}ESM:${distro_codename}-infra-security";
        "${distro_id}:${distro_codename}-updates";
    //      "${distro_id}:${distro_codename}-proposed";
        "${distro_id}:${distro_codename}-backports";
    };
  2. Set Unattended-Upgrade::Mail "root@localhost";

And echo you@yourself >> ~root/.forward to make sure you receive these messages, plus make sure mail is working on the system!

  1. Clean up
Unattended-Upgrade::Remove-Unused-Dependencies "true";
  1. Enable automatic reboot
    Unattended-Upgrade::Automatic-Reboot "true";
    Unattended-Upgrade::Automatic-Reboot-WithUsers "true";
    Unattended-Upgrade::Automatic-Reboot-Time "02:00";

Monitoring

iotop and htop are good for checking where your disk and CPU are being eaten. netdata is an extremely great monitoring tool, that also emails you when its alarms trip.

sudo apt-get install htop iotop netdata
sudo systemctl enable netdata
cat >> /etc/netdata/netdata.conf <<EOF
    # the database size - 1 week
    memory mode = save
    update every = 15
    history = 40000
EOF

The netdata config is to make sure it keeps a week of history instead of the default 1 hour.

kousu commented 3 years ago

After running like this for a while, I noticed that git-annex was only v7, and decided to try to the latest git-annex.

I didn't keep my logs of doing this, but roughly I did:

  1. apt-get purge git-annex-standalone
  2. apt-get purge neurodebian # or maybe edited /etc/apt/sources.list manually? I forget.
  3. apt-get purge [ most x11 programs, firefox, a bunch of other things we don't need ] && sudo apt-get purge ubuntu-desktop # maybe not quite like this? there was a lot of tinkering to cull unusued apps
  4. sudo apt-get install ubuntu-server
  5. sudo do-release-upgrade # to upgrade to Ubuntu 20.04-LTS
  6. apt-get install git-annex

At this point I checked: I still had git-annex v6. Even with the newest Ubuntu LTS? Rude!

But recently Ubuntu 20.10 was released, with git-annex v8 in its repos, so I kept going:

  1. Edit /etc/update-manager/release-upgrades to set Prompt=normal instead of Prompt=lts
  2. sudo do-release-upgrade # to upgrade to Ubuntu 20.10
  3. sudo apt-get install git-annex
  4. git-annex version should report "v8".
kousu commented 3 years ago

Changing the hostname

From a declaration by @jcohenadad, the hostname is now data.neuro.polymtl.ca. I changed it in /etc/hostname and nagged IT to fix up the DNS for us, which happened eventually.

kousu commented 3 years ago

Migrating smb://duke/sct_testing/large to git+annex://data.neuro.polymtl.ca/datasets/

  1. On data, get a clean copy of the dataset:
nguenther@data$ sudo mount -t cifs -o username=u[REDACTED],noexec //duke.neuro.polymtl.ca/sct_testing /mnt/duke/sct_testing
nguenther@data$ mkdir datasets
nguenther@data$ cd datasets

nguenther@data$ time rsync -av  --exclude ".*" /mnt/duke/sct_testing/large .  # this took ~10minutes to copy ~20GB.

The reason for --exlude ".*" is that this dataset had already been put under git-annex via datalad, but via an older version, and it wasn't well handled because it was run directly(!) over smb. So I'm just going to throw out the old git log (and git-annex) and try again.

  1. Import the dataset to git-annex

Reference: from the last dataset I did this to: https://github.com/spine-generic/data-multi-subject_DO-NOT-USE/issues/20

git config --global user.name "Nick Guenther"
git config --global user.email "nick.guenther@polymtl.ca"
git init
cat > .gitattributes <<EOF
*.nii     filter=annex annex.largefiles=anything
*.nii.gz  filter=annex annex.largefiles=anything
EOF
git add README.txt && git commit -m "Initial commit"
git annex init
git add .gitattributes && git commit -m "Configure git-annex"
time git add .  # ~10minutes to hash and copy the entire dataset into .git/annex/objects/
git commit -m "Migrate dataset from smb://duke/sct_testing/large to git-annex."
  1. Upload:
git remote add origin git@data.neuro.polymtl.ca:datasets/sct_testing/large.git
git annex sync --to origin

EDIT: from review by @jcohenadad , I moved this to git@data.neuro.polymtl.ca:datasets/sct-testing-large.git. It turned out I could do this just by:

git@data$ cd ~/repositories/datasets
git@data$ mv sct_testing/large.git sct-testing-large.git
kousu commented 3 years ago

Okay we've fixed the outage, finally. Now back to actually documenting and using this.

kousu commented 3 years ago

@jcohenadad asked me to set up protected branches.

I just need to do a bit of experimenting to get this to work right.

Subtask: #27.

kousu commented 3 years ago

I also want to check in to how permissions are persisted between repositories. In gitolite, whoever uploads a repo first is special, they are the CREATOR, and they can delegate permissions to branches to other users.

I have concerns with this like: what happens if that person goes missing?

kousu commented 3 years ago

We also need to set up backups but I'm tracking that in #20 .

kousu commented 3 years ago

An important tweak: /etc/fstab now automounts the data drive, to prevent a crash there from killing the whole system again:

root@donnees# cat >>/etc/fstab <<EOF 
# datasets
UUID=<whatever the new filesystem ID is> /srv/git/repositories ext4 errors=remount-ro,noauto,x-systemd.automount 0 1
EOF
kousu commented 3 years ago

I removed "create" from the ENABLED list, from feedback from @joshuacwnewton:

``` root@data:~# vi ~git/.gitolite.rc ``` edit like so: ``` ... # List of commands and features to enable ENABLE => [ # COMMANDS # These are the commands enabled by default 'help', 'desc', 'info', 'perms', 'writable', # Uncomment or add new commands here. # 'create', # 'fork', # 'mirror', 'readme', 'keys', # 'sskm', 'D', 'git-annex-shell ua', 'no-create-on-read', ... ```

That leaves only two ways to make a new repo:

  1. ~ssh git@data.neuro.polymtl.ca create~
  2. someone ssh's into the server and makes it there directly
  3. git push

But now there's an awkward asymmetry, because we have:

(side issue: can we rename D to delete?)

joshuacwnewton commented 3 years ago

I removed "create" from the ENABLED list, from feedback from @joshuacwnewton:

Just to clarify, I suggested this because (unless I'm missing something), regular users weren't able to ssh git@data.neuro.polymtl.ca create anyway:

joshua@XPS-15-9560:~$ ssh git@data.neuro.polymtl.ca create test-permissions
FATAL: repo already exists or you are not authorised to create it

Or, was I supposed to do ssh git@data.neuro.polymtl.ca create datasets/<repo-name> because that's the directory I have permissions for?

kousu commented 3 years ago

This is in production now and has been for a while.