Closed kousu closed 3 years ago
The first debate was what software to use: Gitlab, Gitea, or Gitolite.
I went with Gitolite because it has no web interface. Web interfaces are handy for users, but a nightmare to maintain, especially since this internal server isn't going to have a web presence so getting a cert for it from letsencrypt is extremely tricky. Since most of our interaction with the server will be over ssh anyway, especially since it's going to need git-annex in the mix, I think this is okay, and then we sidestep the need for certs because ssh has its own TOFU-based encryption.
Also, Gitlab removed git-annex support, which is fine, I know they have their reasons, but that makes it incompatible with datalad so that's right out for us.
Here's a pretty good endorsement from a lab much like ours: https://caesr.uwaterloo.ca/wildrepos-in-gitolite/
The first step was to name the server:
echo donnees.neuro.polymtl.ca > /etc/hostname
And to ask IT at Polytechnique to set up DNS records (A and PTRs) to match.
Next was to get git
, gitolite
and git-annex
installed. git-annex
needs a server component; it's a smaller server component than git-lfs
needs, but it's still there and since this is Ubuntu 18.04 its git-annex is way too old, we need to install git-annex from elsewhere -- I picked neurodebian -- instead.
These instructions were from http://neuro.debian.net/install_pkg.html?p=git-annex-standalone:
root@donnees:/home/nguenther# wget -O- http://neuro.debian.net/lists/bionic.us-ca.full | sudo tee /etc/apt/sources.list.d/neurodebian.sources.list
--2020-09-22 14:15:35-- http://neuro.debian.net/lists/bionic.us-ca.full
Resolving neuro.debian.net (neuro.debian.net)... 129.170.233.11
Connecting to neuro.debian.net (neuro.debian.net)|129.170.233.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 262
Saving to: ‘STDOUT’
deb http://neurodeb.pirsquared.org data main contrib non-free ] 0 --.-KB/s
#deb-src http://neurodeb.pirsquared.org data main contrib non-free
deb http://neurodeb.pirsquared.org bionic main contrib non-free
#deb-src http://neurodeb.pirsquared.org bionic main contrib non-free
- 100%[=======================================================================>] 262 --.-KB/s in 0s
2020-09-22 14:15:35 (32.9 MB/s) - written to stdout [262/262]
root@donnees:/home/nguenther# sudo apt-key adv --recv-keys --keyserver hkp://pool.sks-keyservers.net:80 0xA5D32F012649A5A9
Executing: /tmp/apt-key-gpghome.m1EYN96iWP/gpg.1.sh --recv-keys --keyserver hkp://pool.sks-keyservers.net:80 0xA5D32F012649A5A9
gpg: key A5D32F012649A5A9: 7 signatures not checked due to missing keys
gpg: key A5D32F012649A5A9: "NeuroDebian Archive Key <pkg-exppsy-maintainers@lists.alioth.debian.org>" 1 new user ID
gpg: key A5D32F012649A5A9: "NeuroDebian Archive Key <pkg-exppsy-maintainers@lists.alioth.debian.org>" 8 new signatures
gpg: Total number processed: 1
gpg: new user IDs: 1
gpg: new signatures: 8
Now we can install the three parts:
root@donnees:/home/nguenther# DEBIAN_FRONTEND=noninteractive apt-get install -y git git-annex-standalone gitolite3
Reading package lists... Done
Building dependency tree
Reading state information... Done
git is already the newest version (1:2.24.0-1~nd18.04+1).
Suggested packages:
xdot bup adb tor magic-wormhole tahoe-lafs uftp git-daemon-sysvinit gitweb
The following NEW packages will be installed:
git-annex-standalone git-remote-gcrypt gitolite3
0 upgraded, 3 newly installed, 0 to remove and 15 not upgraded.
Need to get 64.1 MB/64.3 MB of archives.
After this operation, 189 MB of additional disk space will be used.
Get:1 http://neurodeb.pirsquared.org bionic/main amd64 git-annex-standalone amd64 7.20190819+git2-g908476a9b-1~ndall+1 [64.1 MB]
Fetched 64.1 MB in 2s (29.8 MB/s)
Preconfiguring packages ...
Selecting previously unselected package git-annex-standalone.
(Reading database ... 176026 files and directories currently installed.)
Preparing to unpack .../git-annex-standalone_7.20190819+git2-g908476a9b-1~ndall+1_amd64.deb ...
Unpacking git-annex-standalone (7.20190819+git2-g908476a9b-1~ndall+1) ...
Selecting previously unselected package git-remote-gcrypt.
Preparing to unpack .../git-remote-gcrypt_1.0.2-1_all.deb ...
Unpacking git-remote-gcrypt (1.0.2-1) ...
Selecting previously unselected package gitolite3.
Preparing to unpack .../gitolite3_3.6.7-2_all.deb ...
Unpacking gitolite3 (3.6.7-2) ...
Setting up git-remote-gcrypt (1.0.2-1) ...
Setting up git-annex-standalone (7.20190819+git2-g908476a9b-1~ndall+1) ...
Setting up gitolite3 (3.6.7-2) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Processing triggers for mime-support (3.60ubuntu1) ...
Processing triggers for desktop-file-utils (0.23-1ubuntu3.18.04.2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for gnome-menus (3.13.3-11ubuntu1.1) ...
We're going to set up gitolite to be primarily based around "wildrepos", like https://caesr.uwaterloo.ca/wildrepos-in-gitolite/, or in other words: to operate as if it was Github. Gitolite's initial design assumed that server admins would be closely involved with what went on on their server; but that'll be too onerous for us, so instead I'm going to set it up so every repo is managed by its users. See below for the specifics.
gitolite-mods/keys
:While I was looking at gitolite, I liked it, except I didn't like that it wanted you to clone and edit gitolite-admin
just to manage your users, especially when it has CLIs for everything else. It has sskm
and ukm
that offer this, but one is unmaintained and has a twisty UI, and the other has a twisty UI and is only meant for use by admins; I think both are mistake-prone. So I merged both of them into keys
which has a more straightforward UI. And yes I know I just did this.
root@donnees:/home/nguenther# curl -JL https://raw.githubusercontent.com/kousu/gitolite-mods/master/keys -o /usr/share/gitolite3/commands/keys
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10999 100 10999 0 0 65082 0 --:--:-- --:--:-- --:--:-- 64700
root@donnees:/home/nguenther# ls -l /usr/share/gitolite3/commands/keys
-rw-r--r-- 1 root root 10999 Sep 22 14:20 /usr/share/gitolite3/commands/keys
root@donnees:/home/nguenther# chmod +x /usr/share/gitolite3/commands/keys
gitolite needs a single system user within which to place all its; this is the same way github works, by the way: whenever you clone a repo over ssh you're always going git clone git@github.com:....
.
Make sure it's out of the way first:
root@donnees:/home/nguenther# id git
id: ‘git’: no such user
root@donnees:/home/nguenther# useradd -d /srv/git -m git
root@donnees:/home/nguenther# ls /srv/git/
root@donnees:/home/nguenther# ls -ld /srv/git/
drwxr-xr-x 2 git git 4096 Sep 22 14:23 /srv/git/
kousu@ail:~$ scp ~/.ssh/id_ed25519.neuropoly.pub 132.207.65.204:/tmp/nguenther.pub
id_ed25519.neuropoly.pub
root@donnees:/tmp# sudo -u git -i gitolite setup -pk /tmp/nguenther.pub
Initialized empty Git repository in /srv/git/repositories/gitolite-admin.git/
Initialized empty Git repository in /srv/git/repositories/testing.git/
WARNING: /srv/git/.ssh missing; creating a new one
(this is normal on a brand new install)
WARNING: /srv/git/.ssh/authorized_keys missing; creating a new one
(this is normal on a brand new install)
We need: create
, readme
, keys
, and D
, git-annex
; also turn on "no-create-on-read" b:
In
root@donnees:/tmp# vi ~git/.gitolite.rc
edit like so:
...
# List of commands and features to enable
ENABLE => [
# COMMANDS
# These are the commands enabled by default
'help',
'desc',
'info',
'perms',
'writable',
# Uncomment or add new commands here.
'create',
# 'fork',
# 'mirror',
'readme',
'keys',
# 'sskm',
'D',
'git-annex-shell ua',
'no-create-on-read',
...
"no-create-on-read" prevents giving empty repo when someone git clones
it; instead, it errors, and they need to create it locally first and then git push
; I don't know why it's not on by default.
kousu@ail:~$ ssh git@132.207.65.204 info
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
hello nguenther, this is git@donnees running gitolite3 3.6.7-2 (Debian) on git 2.24.0
R W gitolite-admin
R W testing
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ git clone git@132.207.65.204:gitolite-admin
Cloning into 'gitolite-admin'...
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (6/6), done.
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cd gitolite-admin/
Edit the config:
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ vi conf/gitolite.conf
To read:
@admin = nguenther
# who can control this meta-repo.
repo gitolite-admin
RW+ = @admin
# wildcard repositories for the users
# users can grant R/W permissions with `ssh git@server perms`
repo CREATOR/..*
C = @all
RW+D = @admin CREATOR
RW = WRITERS
R = READERS
# semi-public datasets
repo datasets/..*
C = @all
RW+D = @admin CREATOR
RW = WRITERS
R = @all
Upload:
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git add -u
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git commit -m "Configure permissions"
[master afb3383] Configure permissions
1 file changed, 20 insertions(+), 5 deletions(-)
rewrite conf/gitolite.conf (74%)
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git push
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 528 bytes | 132.00 KiB/s, done.
Total 4 (delta 0), reused 0 (delta 0)
To 132.207.65.204:gitolite-admin
dccfb34..afb3383 master -> master
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ ssh git@132.207.65.204 info
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
hello nguenther, this is git@donnees running gitolite3 3.6.7-2 (Debian) on git 2.24.0
R W C CREATOR/..*
R W C datasets/..*
R W gitolite-admin
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ ssh git@132.207.65.204 help
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
hello nguenther, this is git@donnees running gitolite3 3.6.7-2 (Debian) on git 2.24.0
list of remote commands available:
D
create
desc
help
info
keys
perms
readme
writable
Make sure my keys
command seems to be behaving:
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ ssh git@132.207.65.204 keys
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
Hello nguenther, you are an admin.
These are all registered keys:
============================
1: SHA256:EBfMaqmOuoXeNU7BGuDm2S07tgZgdkuEBMAQlmV3fAI : nguenther.pub
keys
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cat jcohen@polymtl.ca.pub | ssh git@132.207.65.204 keys add jcohen
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
Added SHA256:Ss3ePRjzwzjZAUYmqItooySyJdtd2UvlqbDZ5UaIAHo : jcohen.pub
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cat andreanne.lemay@polymtl.ca.pub | ssh git@132.207.65.204 keys add andreannelemay
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
Added SHA256:AZp8tEp8yJKivYB91wPWqRyVIQm3SzlJYk7PlPv26o8 : andreannelemay.pub
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cat alexandru.foias@polymtl.ca.pub | ssh git@132.207.65.204 keys add alfoi
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
please supply the new key on STDIN (e.g. cat you.pub | ssh gitolite@git.example.com keys add @laptop).
Added SHA256:BZcsg/BfyQ27pIOSFw94ZiBmTKGHJ7Qy/Vqww/x5ujQ : alexfoi.pub
Grant admin to @jcohenadad and @alexfoias
-- even with keys
there's no way to do this from the CLI, so you have to go into the gitolite-admin repo again and edit the @admins list
kousu@ail:~/work/neuropoly/projects/datalad/gitolite$ cd gitolite-admin/
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ vi conf/gitolite.conf
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git diff
diff --git a/conf/gitolite.conf b/conf/gitolite.conf
index 0366efe..0ad7680 100644
--- a/conf/gitolite.conf
+++ b/conf/gitolite.conf
@@ -1,4 +1,4 @@
-@admin = nguenther
+@admin = nguenther jcohen alexfoi
# who can control this meta-repo.
repo gitolite-admin
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git add -u
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git commit -m "Nominate Julien and Alex as server admins"
[master 6727d16] Nominate Julien and Alex as server admins
1 file changed, 1 insertion(+), 1 deletion(-)
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git push
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
To 132.207.65.204:gitolite-admin
! [rejected] master -> master (fetch first)
error: failed to push some refs to 'git@132.207.65.204:gitolite-admin'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git pull --rebase # necessary because `keys` edited this repo remotely!
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 12 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (12/12), done.
From 132.207.65.204:gitolite-admin
afb3383..9e64c62 master -> origin/master
First, rewinding head to replay your work on top of it...
Applying: Nominate Julien and Alex as server admins
kousu@ail:~/work/neuropoly/projects/datalad/gitolite/gitolite-admin$ git push
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 401 bytes | 401.00 KiB/s, done.
Total 4 (delta 1), reused 0 (delta 0)
To 132.207.65.204:gitolite-admin
9e64c62..5def6d5 master -> master
Upload some data just to see.
I didn't keep logs of this part, I'm sorry. I just know what I would have done.
On my laptop:
git clone git@github.com:spine-generic/data-single-subject.git
cd data-single-subject/
git annex init
git annex dead here # disable git-annex's whereis tracking on this local copy. we don't want that cruft uploaded.
git annex get .
git remote add internal git@data.neuro.polymtl.ca:datasets/data-single-subject.git
git push internal --all
git annex copy --to internal
Log in to the server and check that it's there and the annex has been filled in:
du -hs ~git/repositories/datasets/data-single-subject.git/annex
A 1To disk was provisioned onto the server by the hypervisor (thanks Jean-Sébastien)
nguenther@donnees:~$ sudo fdisk /dev/sdb
[sudo] password for nguenther:
Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): p
Disk /dev/sdb: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 32768 bytes
Disklabel type: gpt
Disk identifier: 8804AEEA-1B56-4E85-9F98-05D32D5767C6
Device Start End Sectors Size Type
/dev/sdb1 34 32767 32734 16M Microsoft reserved
/dev/sdb2 32768 2147479551 2147446784 1024G Microsoft basic data
Partition 1 does not start on physical sector boundary.
Command (m for help): m
Help:
Generic
d delete a partition
F list free unpartitioned space
l list known partition types
n add a new partition
p print the partition table
t change a partition type
v verify the partition table
i print information about a partition
Misc
m print this menu
x extra functionality (experts only)
Script
I load disk layout from sfdisk script file
O dump disk layout to sfdisk script file
Save & Exit
w write table to disk and exit
q quit without saving changes
Create a new label
g create a new empty GPT partition table
G create a new empty SGI (IRIX) partition table
o create a new empty DOS partition table
s create a new empty Sun partition table
Command (m for help): g
Created a new GPT disklabel (GUID: F39B8299-4E4E-8C4B-96BF-758F07539380).
Command (m for help): n
Partition number (1-128, default 1):
First sector (2048-2147483614, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-2147483614, default 2147483614):
Created a new partition 1 of type 'Linux filesystem' and of size 1024 GiB.
Command (m for help): p
Disk /dev/sdb: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 32768 bytes
Disklabel type: gpt
Disk identifier: F39B8299-4E4E-8C4B-96BF-758F07539380
Device Start End Sectors Size Type
/dev/sdb1 2048 2147483614 2147481567 1024G Linux filesystem
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
Make the partition:
nguenther@donnees:~$ sudo mkfs.ext4 -L "neuropoly-data" /dev/sdb1
Put it in fstab:
root@donnees# cat >>/etc/fstab <<EOF
# datasets
UUID=efcb5b7c-0aad-44fd-b949-cc98595e0296 /srv/git/repositories ext4 errors=remount-ro 0 1
EOF
Swap data locations:
git@donnees$ mv repositories repositories.bak
git@donnees$ mkdir repositories
root@donnees# mount /srv/git/repositories
root@donnees# chown -R git:git /srv/git/repositories
git@donnees$ rsync -av repositories.bak/ repositories/
--
There was a nuisance here: because repositories/ is now the root of a filesystem, it contains a "lost+found/" folder after fscking, and that shows up in gitolite
, eg
kousu@ail:~/src/neuropoly/datalad$ ssh git@132.207.65.204 info
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly':
PTY allocation request failed
hello nguenther, this is git@donnees running gitolite3 3.6.7-2 (Debian) on git 2.24.0
R W C CREATOR/..*
R W C datasets/..*
find: ‘./lost+found’: Permission denied
R W datasets/data-single-subject
R W gitolite-admin
Shared connection to 132.207.65.204 closed.
Can I make gitolite ignore certain folders? I read the code; it doesn't look like it:
so what can I do?
I went with just deleting it and hoping it doesn't come back.
apt-get install mailutils opensmtpd
echo "buttercups" | mail -s "Testing from data.neuro.polymtl.ca 123" nick@kousu.ca
This mailer is always going to be a bit sketchy since I don't control the DNS so I can't auth it fully; maybe I can set up a relay account for it somewhere?
Anyway this seems to be good enough; @polymtl.ca accepts mails from it, and that should be enough for now.
(ref: https://wiki.debian.org/UnattendedUpgrades)
dpkg-reconfigure -plow unattended-upgrades
In
vi /etc/apt/apt.conf.d/50unattended-upgrades
Apply these lines:
enable the -updates repo; this is where the bulk of the updates come in Ubuntu; without this, you will only get the critical security updates, which is too conservative
// Automatically upgrade packages from these (origin:archive) pairs
//
// Note that in Ubuntu security updates may pull in new dependencies
// from non-security sources (e.g. chromium). By allowing the release
// pocket these get automatically pulled in.
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}";
"${distro_id}:${distro_codename}-security";
// Extended Security Maintenance; doesn't necessarily exist for
// every release and this system may not have it installed, but if
// available, the policy for updates is such that unattended-upgrades
// should also install from here by default.
"${distro_id}ESMApps:${distro_codename}-apps-security";
"${distro_id}ESM:${distro_codename}-infra-security";
"${distro_id}:${distro_codename}-updates";
// "${distro_id}:${distro_codename}-proposed";
"${distro_id}:${distro_codename}-backports";
};
Set Unattended-Upgrade::Mail "root@localhost";
And echo you@yourself >> ~root/.forward
to make sure you receive these messages, plus make sure mail is working on the system!
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Automatic-Reboot "true";
Unattended-Upgrade::Automatic-Reboot-WithUsers "true";
Unattended-Upgrade::Automatic-Reboot-Time "02:00";
iotop and htop are good for checking where your disk and CPU are being eaten. netdata is an extremely great monitoring tool, that also emails you when its alarms trip.
sudo apt-get install htop iotop netdata
sudo systemctl enable netdata
cat >> /etc/netdata/netdata.conf <<EOF
# the database size - 1 week
memory mode = save
update every = 15
history = 40000
EOF
The netdata config is to make sure it keeps a week of history instead of the default 1 hour.
After running like this for a while, I noticed that git-annex was only v7, and decided to try to the latest git-annex.
I didn't keep my logs of doing this, but roughly I did:
apt-get purge git-annex-standalone
apt-get purge neurodebian
# or maybe edited /etc/apt/sources.list manually? I forget.apt-get purge [ most x11 programs, firefox, a bunch of other things we don't need ] && sudo apt-get purge ubuntu-desktop
# maybe not quite like this? there was a lot of tinkering to cull unusued appssudo apt-get install ubuntu-server
sudo do-release-upgrade
# to upgrade to Ubuntu 20.04-LTSapt-get install git-annex
At this point I checked: I still had git-annex v6. Even with the newest Ubuntu LTS? Rude!
But recently Ubuntu 20.10 was released, with git-annex v8 in its repos, so I kept going:
Prompt=normal
instead of Prompt=lts
sudo do-release-upgrade
# to upgrade to Ubuntu 20.10sudo apt-get install git-annex
git-annex version
should report "v8".From a declaration by @jcohenadad, the hostname is now data.neuro.polymtl.ca
. I changed it in /etc/hostname and nagged IT to fix up the DNS for us, which happened eventually.
data
, get a clean copy of the dataset:nguenther@data$ sudo mount -t cifs -o username=u[REDACTED],noexec //duke.neuro.polymtl.ca/sct_testing /mnt/duke/sct_testing
nguenther@data$ mkdir datasets
nguenther@data$ cd datasets
nguenther@data$ time rsync -av --exclude ".*" /mnt/duke/sct_testing/large . # this took ~10minutes to copy ~20GB.
The reason for --exlude ".*"
is that this dataset had already been put under git-annex via datalad, but via an older version, and it wasn't well handled because it was run directly(!) over smb. So I'm just going to throw out the old git log (and git-annex) and try again.
Reference: from the last dataset I did this to: https://github.com/spine-generic/data-multi-subject_DO-NOT-USE/issues/20
git config --global user.name "Nick Guenther"
git config --global user.email "nick.guenther@polymtl.ca"
git init
cat > .gitattributes <<EOF
*.nii filter=annex annex.largefiles=anything
*.nii.gz filter=annex annex.largefiles=anything
EOF
git add README.txt && git commit -m "Initial commit"
git annex init
git add .gitattributes && git commit -m "Configure git-annex"
time git add . # ~10minutes to hash and copy the entire dataset into .git/annex/objects/
git commit -m "Migrate dataset from smb://duke/sct_testing/large to git-annex."
git remote add origin git@data.neuro.polymtl.ca:datasets/sct_testing/large.git
git annex sync --to origin
EDIT: from review by @jcohenadad , I moved this to git@data.neuro.polymtl.ca:datasets/sct-testing-large.git
. It turned out I could do this just by:
git@data$ cd ~/repositories/datasets
git@data$ mv sct_testing/large.git sct-testing-large.git
Okay we've fixed the outage, finally. Now back to actually documenting and using this.
@jcohenadad asked me to set up protected branches.
I just need to do a bit of experimenting to get this to work right.
Subtask: #27.
I also want to check in to how permissions are persisted between repositories. In gitolite, whoever uploads a repo first is special, they are the CREATOR
, and they can delegate permissions to branches to other users.
I have concerns with this like: what happens if that person goes missing?
We also need to set up backups but I'm tracking that in #20 .
An important tweak: /etc/fstab
now automounts the data drive, to prevent a crash there from killing the whole system again:
root@donnees# cat >>/etc/fstab <<EOF
# datasets
UUID=<whatever the new filesystem ID is> /srv/git/repositories ext4 errors=remount-ro,noauto,x-systemd.automount 0 1
EOF
I removed "create"
from the ENABLED
list, from feedback from @joshuacwnewton:
That leaves only two ways to make a new repo:
ssh git@data.neuro.polymtl.ca create
~git push
But now there's an awkward asymmetry, because we have:
git push
ssh git@data.neuro.polymtl.ca D
(side issue: can we rename D
to delete
?)
I removed "create" from the ENABLED list, from feedback from @joshuacwnewton:
Just to clarify, I suggested this because (unless I'm missing something), regular users weren't able to ssh git@data.neuro.polymtl.ca create
anyway:
joshua@XPS-15-9560:~$ ssh git@data.neuro.polymtl.ca create test-permissions
FATAL: repo already exists or you are not authorised to create it
Or, was I supposed to do ssh git@data.neuro.polymtl.ca create datasets/<repo-name>
because that's the directory I have permissions for?
This is in production now and has been for a while.
I am deploying a git server on poly's internal infrastructure. This is cheaper, and faster (bandwidth-wise) than paying Amazon or Github for hosting the large datasets we experiment on, and safer for our datasets with privileged medical data.