GIN

https://praxisinstitute.org wants to fund a Canada-wide spine scan sharing platform.

They were considering paying OBI as a vendor, and having them set up a neuroimaging repository. But had doubts about the quality of that solution and have looked around for others, and have landed on asking us for help.

We've proposed a federated data sharing plan and they are interested in pursuing this line.

Needs

publication (marketting, accessibility): datasets need to be obvious enough and easy enough to get at that they get used
- we're thinking this looks like a portal or meta-index site; something, like https://portal.conp.ca/
curation: datasets need to be kept in a high quality state, with good metadata and artifacts corrected
- this means specifying formats (like BIDS, nifti, etc)
- having scripts to automatically check these formats like we do in spine-generic
- specifying a checklist of manual curation steps
- training a curator at each participating hospital/site
data protection: datasets must have effective ACLs attached; the ACLs should implement the data sharing and consent agreements that each source study gets
- this is one of the motivations for making a federated system: each jurisdiction operates under slightly different data protection laws and making a single site for everyone is legally fraught
uploading: data needs to get from the scanners into the data archive
- this is, from our source on the inside, the single most expensive and difficult part. Most hospitals are running a proprietary PACS system where they store their images; extracting the images from that to a regular computer is often extremely tedious, manual, and expensive.
- so, part of the project is to specify a format (as above) and aid each site in writing scripts to get data from their PACS system into that format and uploaded
- versioning: data should be versioned, so that work can be reproduced
- mirroring: data should be easily backed up from one site to another

Software Design

Global Architecture

                             bidirectional mirroring
                                           │
                                           │                           ┌───────────────┐
                                      ┌────┼──────────────────────────▲│               │
                                      │┼───────────────────────────────┤ data.site4.ca │
                                      ││                               │               │
                                      ││                               ├────────┬────┬─┘
                                      ││                         ┌─────┴────────┘    │
                                      ││                         │                   │
                                      ││                         │                   │
                      ┌───────────────┘▼┐                        │                   │
                      │                 │                        │                   │
                      │ images.site5.ca │                        │                   │
                      │                 │                        │                   │
                      └─────────────────┴─┐                      │                   │
                                          │                      │                   ├──────mirroring
                                          │                      ├────notifying      │
                                          │                      │                   │
                                          │                      │                   │
                                          │                      │                   │
                                       xxx│xxxxxxxxxxxxxxxxxxxxxx▼xxxxxx             │
                                       x┌─▼───────────────────────────┐x             ▼
   ┌─────────────────┐                 x│                             │x         ┌─────────────────┐
   │                 │                 x│   data.praxisinstitute.org   │x         │imagerie.site7.ca│
   │ spines.site6.ca ├─────────────────►│                             │◄─────────┴─┬───────────────┘
   │                 │                 x└─────────────────────────────┘x           │
   └──────────┬────┬─┘                 xxxx▲xxxxxxx▲xxxxxxxxxxxxxxx▲xxxx           │
              │    │                       │       │               │               │
              │    │                       │       │               │               │
              │    │                       │       │               │               │      ┌───────────────┐
              │    │                       │       │               ├───────────────┼──────┘ data.site3.ca │
              │    └───────────────────────┼───────┼───────────────┴───────────────┼──────▼─┬─────────────┘
              ▼                            │       │                               │        │
┌────────────────┐                         │       │                               │        │
│                ├─────────────────────────┘       │                               │        │
│ data.site1.ca  │                                 │                               │        │
│                │                                 │                               │        │
└─────────────┬─▲┘                                 │                               │        │
              │ │                                  │                               │        │
              │ │                                  │                               │        │
              │ │                     ┌────────────┴┐                              │        │
              │ └─────────────────────┤             │                              │        │
              │                       │data.site2.ca◄──────────────────────────────┼────────┘
              └──────────────────────►│             │                              │
                                      └─────────────┘◄─────────────────────────────┘

(edit diagram)

Per-Site Architecture

┌──────────────────┐            ┌──────────────────┐           ┌───────────────────┐           ┌───────────────────────────┐
│                  │            │                  │           │                   │           │                           │
│                  │            │                  │           │                   │           │                           │
│                  │            │                  │           │                   │           │                           │
│                  │            │                  │           │                   │           │                           │
│       PACS       ├─────┬──────►       BIDS       ├─────┬────►│  data.example.ca  ├─────┬────►│  data.praxisinstitute.ca  │
│                  │     │      │                  │     │     │                   │     │     │                           │
│                  │     │      │                  │     │     │   (data server)   │     │     │         (portal)          │
│                  │     │      │                  │     │     │                   │     │     │                           │
│                  │     │      │                  │     │     │                   │     │     │                           │
└──────────────────┘     │      └──────────────────┘     │     └───────────────────┘     │     └───────────────────────────┘
                         │                               │                               │
                         │                               │                               │
                         │                               │                               │
                         │                               │                               │
                   export scripts                     uploader                        notifier
                written by each site.               written by us                   written by us
                                             (`git push`, `rsync -a`, etc)      (cronjob: curl data.praxisinstitute.ca/notify ...)

(edit diagram)

Components

It's easy to spend a lot of money writing software from scratch. I don't think we should do that. I think what we should do is design some customizations of existing software and build packages that deploy them.

Data Servers

I have two options in mind for the data servers:

GIN - this is a datalad server
NextCloud - this is a WebDAV (and more) server

GIN is itself built on top of pre-existing open source projects: Gogs, git-annex, datalad, git, combined in a customized package. We would take it and further customize it. It is a little more sprawling than NextCloud. Being focused on neuroscience, we could easily upstream customizations we design back to them to help out the broader community.

NextCloud is a lot simpler to use than datalad. You can mount it onto Windows, Linux, and macOS as a cloud disk (via WebDAV). It also has a strong company behind it, lots of users, good apps. It's meant for more general use than science; actually, it was never designed for science. It would be harder to share any improvements we make to it; though we could publish our packages and any plugins back to the wider NextCloud ecosystem. It has some other weaknesses too.

Uploading/Downloading

With GIN, uploading can be done with git:

Follow https://github.com/neuropoly/data-management/blob/master/git-annex.md#new-repo, or something like it (these are our instructions; different sites might disagree; this is one of the ways GIN/datalad is a bit more ad-hoc and difficult)

Do this:

git remote add origin git@data.site1.ca:my-new-dataset.git
git push -u origin master # or, actually, maybe GIN forces you to first make the remote repo in its UI? Unsure
git annex sync --content

Downloading is replacing the first two lines with git clone.

Windows and macOS do not have a git client built in.

With NextCloud, uploading can be done with davfs2 + rsync:

mount -t davfs2 data.site1.ca /mnt/data  # something close to this anyway
rsync -av my-new-dataset /mnt/data/

Downloading is just reversing the arguments to rsync.

There's also cadaver, and Windows and macOS have WebDAV built in.

Versioning

GIN is based on git, so it has very strong versioning.

There are git fsck and git annex fsck to validate that what's on-disk is as expected.

NextCloud only supports weak versioning.

But maybe we can write a plugin that improves that. Somehow. We would have to figure out a way to mount an old version of a dataset.

Permissions

NextCloud has federated ACLs built in: users on data.site1.ca can grant chosen users on spines.site6.ca access to specific folders A/, B/ and D/.

I am unsure what GIN has; since it's based on Gogs, it probably has public/private/protected datasets, all the same controls that Github and Gitlab implement, but I don't think it supports federated ACLs. Federation with GIN might look like everyone having to have one account per site.

But maybe we could improve that; perhaps we could patch GIN to supported federated ACLs as well. We would need to study how NextCloud does it, how GIN does it, and see where we can fit them in.

Sharing

In a federated model, data-sharing is done bidirectionally: humans at each site grant each other access to data sets, one at a time.

We should encourage the actual data sharing to happen via mirroring, for the sake of encouraging resiliancy in the network.

I think we should encourage the

Gitlab supports mirroring; on https://gitlab.com/$user/$repo/-/settings/repository you will find the mirror settings

Gitlab Example

![2021-05-19-223217_792x822_scrot](https://user-images.githubusercontent.com/987487/118910267-32465a80-b8f2-11eb-8861-67ca7e7055fb.png)

We need to replicate this kind of UI in whatever we deploy.

Portal

For the portal, we can probably write most of it using a static site generator like hugo, plus a small bit of code to add (and remove) datasets.

The dataset list can be generated either by pushing or pulling: the data servers could notify the portal (this is how I've drawn the diagrams above) or the portal could connect to the data servers in a cronjob to ask them what datasets they have. Generally, the latter is more stable, but the former is more accurate.

It should be possible to keep the list of datasets, one per file, in a folder on the portal site, and have hugo automatically read them all and produce a big, cross-referenced, searchable index.

Packaging

We should provide an automated installer for each site to deploy the software parts. It should be as automated as possible: no more than 5 manual install steps.

We can build the packages in:

ansible
conda
custom bash install scripts
.deb

I think .deb is the smoothest option here; I have some experience using pacur, and we can use their package server to host the .debs. Whatever we pick, the important thing is that we deliver the software reproducibly and with as little manual customization as possible.

I think we should produce two packages:

praxis-data-server - the per-site software
praxis-index-server - the portal site

We might also need to produce uploader scripts; but because praxis-data-server will use standard upload protocols it's not as important to do so; moreover, because the uploader will be run directly by users, it will need to deal with cross-platform issues which makes it harder to package. I think, at least as a first draft, we should just document what command lines will get your data uploaded

Software Development Tasks

write specification for what a proper dataset looks like; probably BIDS, maybe with some extra restrictions
deploy data server prototypes
write data server customizations
write uploader scripts?
write PACS -> BIDS scripts that conform to that specification
- This is an ad-hoc, per site task. We can help, but we can't write these scripts ourselves.
write portal site
write data server -> portal notifier
- sender
- receiver
- write packages
  - data server
  - portal server
- write uploading documentation
- write downloading documentation
- write ACL documentation

Open Questions

is datalad (the GIN client) compatible with Windows? Is it compatible with Windows without using WSL?
Is there a way to build software support for attaching data sharing agreements to the permissions?
- I'm envisioning a situation where, next to a mirror of a dataset, its data sharing agreement .pdf is linked; perhaps even some of the terms, like expiry dates, can be programmed in.
NextCloud or GIN? Or something else?
Should we focus on trying to develop federated ACLs for GIN, or

Summary

We can build a federated data system on either GIN or nextcloud. Either one will require some software development to tune it for our use case, but much less than writing a system from scratch. Both support widely supported network protocols which makes them cross-compatible, reliable, and avoids the cost of developing custom clients.

Cost Estimates

2 devs * 1 month to produce the packages: ~10k-20k $
N*5k$ - per-site server hardware
N*5k$ - per-site sysadmin time
N*40k$ - per-site data curator time

Which works out to about 18 or 19 sites that Praxis can fund this year.

I'm going to make some demos to make this more concrete.

I'm starting with NextCloud. I'm going to deploy 3 NextClouds and configure them with federation sharing.

Hardware

The first thing to do is to get some hardware. Vultr.com has cheap VPSes. I bought three of them in Canada (the screenshot covered my datacenter selection; trust me, it's in Canada):

Screenshot_2021-05-20 Deploy Servers

Notice I'm choosing Debian, but any Linux option would work.

Just gotta wait for them to deploy....

2021-05-20-135220_1252x273_scrot

And they're up:

2021-05-20-135306_1265x225_scrot

Networking

The second thing to do is set up DNS so these are live net-wide servers.

I went over to my personal DNS server and added

2021-05-20-135612_756x571_scrot 2021-05-20-135815_753x574_scrot

2021-05-20-135702_756x567_scrot

Now just gotta wait for that to deploy...

and they're up too:

[kousu@requiem ~]$ dig data1.praxis.kousu.ca
216.128.176.232
[kousu@requiem ~]$ dig data2.praxis.kousu.ca
216.128.179.150
[kousu@requiem ~]$ dig data3.praxis.kousu.ca
149.248.50.100

Sysadmin Access Control

I just need to make sure I have access secured. I'm going to do two things:

I go to the VPS settings one at a time and grab the root passwords:

2021-05-20-140156_384x187_scrot

then I log in, and confirm the system looks about right:

[kousu@requiem ~]$ ssh root@data1.praxis.kousu.ca
root@data1.praxis.kousu.ca's password: 
Linux data1.praxis.kousu.ca 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@data1:~# ls /home/
root@data1:~# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Then I use ssh-copy-id to enroll myself:

[kousu@requiem ~]$ ssh-copy-id -i ~/.ssh/id_ed25519 root@data1.praxis.kousu.ca
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/kousu/.ssh/id_ed25519.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@data1.praxis.kousu.ca's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@data1.praxis.kousu.ca'"
and check to make sure that only the key(s) you wanted were added.

[kousu@requiem ~]$ ssh root@data1.praxis.kousu.ca
Linux data1.praxis.kousu.ca 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu May 20 17:59:53 2021 from 192.222.158.190
root@data1:~#

And now that that works, I disable root password login, which is a pretty important security baseline:

root@data1:~# sed -i 's/PermitRootLogin yes/#PermitRootLogin no/' /etc/ssh/sshd_config 
root@data1:~# systemctl restart sshd

In a different terminal without disconnecting, in case we need to do repairs, verify this worked by:

that I can still ssh in using the key

[kousu@requiem ~]$ ssh root@data1.praxis.kousu.ca
Linux data1.praxis.kousu.ca 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu May 20 18:06:42 2021 from 192.222.158.190

that, when I tell ssh to only use password auth, it rejects me

[kousu@requiem ~]$ ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no root@data1.praxis.kousu.ca
root@data1.praxis.kousu.ca's password: 
Permission denied, please try again.

I'm also going to add a sudo account, as a backup:

First, invent a password:

[kousu@requiem ~]$ pass generate -n -c servers/praxis/data1.praxis.kousu.ca

Then make the account:

root@data1:~# sed -i 's|/bin/sh|/bin/bash|' /etc/default/useradd 
root@data1:~# useradd -m kousu
root@data1:~# passwd kousu
New password: 
Retype new password: 
passwd: password updated successfully
root@data1:~# usermod -a -G sudo kousu

Test the account:

[kousu@requiem ~]$ ssh data1.praxis.kousu.ca
kousu@data1.praxis.kousu.ca's password: 
Linux data1.praxis.kousu.ca 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu May 20 18:14:38 2021 from 192.222.158.190
$ sudo ls

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for kousu: 
$ groups
kousu sudo
$

So this means I have two ways in, the root password is disabled, my own user password is lengthy and secure.

Now repeat the same for data2.praxis.kousu.ca and data3.praxis.kousu.ca.

Basic config

Set system hostname -> already done by Vultr, thanks Vultr

(and repeat for each of the three)

updates

root@data1:~# apt-get update && DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
Hit:1 http://deb.debian.org/debian buster InRelease
Get:2 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:3 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:4 http://security.debian.org/debian-security buster/updates/main Sources [185 kB]
Get:5 http://security.debian.org/debian-security buster/updates/main amd64 Packages [289 kB]
Get:6 http://security.debian.org/debian-security buster/updates/main Translation-en [150 kB]
Fetched 740 kB in 0s (2283 kB/s)                          
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

(and repeat for each of the three)

`unattended-upgrades`

apt-get install unattended-upgrades

Configure it like I've done for our internal servers: enable regular updates, not just security ones, do updates once a week, enable auto-reboot.

/etc/apt/apt.conf.d/50unattended-upgrades

``` // Unattended-Upgrade::Origins-Pattern controls which packages are // upgraded. // // Lines below have the format format is "keyword=value,...". A // package will be upgraded only if the values in its metadata match // all the supplied keywords in a line. (In other words, omitted // keywords are wild cards.) The keywords originate from the Release // file, but several aliases are accepted. The accepted keywords are: // a,archive,suite (eg, "stable") // c,component (eg, "main", "contrib", "non-free") // l,label (eg, "Debian", "Debian-Security") // o,origin (eg, "Debian", "Unofficial Multimedia Packages") // n,codename (eg, "jessie", "jessie-updates") // site (eg, "http.debian.net") // The available values on the system are printed by the command // "apt-cache policy", and can be debugged by running // "unattended-upgrades -d" and looking at the log file. // // Within lines unattended-upgrades allows 2 macros whose values are // derived from /etc/debian_version: // ${distro_id} Installed origin. // ${distro_codename} Installed codename (eg, "buster") Unattended-Upgrade::Origins-Pattern { // Codename based matching: // This will follow the migration of a release through different // archives (e.g. from testing to stable and later oldstable). // Software will be the latest available for the named release, // but the Debian release itself will not be automatically upgraded. "origin=Debian,codename=${distro_codename}-updates"; // "origin=Debian,codename=${distro_codename}-proposed-updates"; "origin=Debian,codename=${distro_codename},label=Debian"; "origin=Debian,codename=${distro_codename},label=Debian-Security"; // Archive or Suite based matching: // Note that this will silently match a different release after // migration to the specified archive (e.g. testing becomes the // new stable). // "o=Debian,a=stable"; // "o=Debian,a=stable-updates"; // "o=Debian,a=proposed-updates"; // "o=Debian Backports,a=${distro_codename}-backports,l=Debian Backports"; }; // Python regular expressions, matching packages to exclude from upgrading Unattended-Upgrade::Package-Blacklist { // The following matches all packages starting with linux- // "linux-"; // Use $ to explicitely define the end of a package name. Without // the $, "libc6" would match all of them. // "libc6$"; // "libc6-dev$"; // "libc6-i686$"; // Special characters need escaping // "libstdc\+\+6$"; // The following matches packages like xen-system-amd64, xen-utils-4.1, // xenstore-utils and libxenstore3.0 // "(lib)?xen(store)?"; // For more information about Python regular expressions, see // https://docs.python.org/3/howto/regex.html }; // This option allows you to control if on a unclean dpkg exit // unattended-upgrades will automatically run // dpkg --force-confold --configure -a // The default is true, to ensure updates keep getting installed //Unattended-Upgrade::AutoFixInterruptedDpkg "true"; // Split the upgrade into the smallest possible chunks so that // they can be interrupted with SIGTERM. This makes the upgrade // a bit slower but it has the benefit that shutdown while a upgrade // is running is possible (with a small delay) //Unattended-Upgrade::MinimalSteps "true"; // Install all updates when the machine is shutting down // instead of doing it in the background while the machine is running. // This will (obviously) make shutdown slower. // Unattended-upgrades increases logind's InhibitDelayMaxSec to 30s. // This allows more time for unattended-upgrades to shut down gracefully // or even install a few packages in InstallOnShutdown mode, but is still a // big step back from the 30 minutes allowed for InstallOnShutdown previously. // Users enabling InstallOnShutdown mode are advised to increase // InhibitDelayMaxSec even further, possibly to 30 minutes. //Unattended-Upgrade::InstallOnShutdown "false"; // Send email to this address for problems or packages upgrades // If empty or unset then no email is sent, make sure that you // have a working mail setup on your system. A package that provides // 'mailx' must be installed. E.g. "user@example.com" Unattended-Upgrade::Mail "root"; // Set this value to "true" to get emails only on errors. Default // is to always send a mail if Unattended-Upgrade::Mail is set //Unattended-Upgrade::MailOnlyOnError "false"; // Remove unused automatically installed kernel-related packages // (kernel images, kernel headers and kernel version locked tools). //Unattended-Upgrade::Remove-Unused-Kernel-Packages "true"; // Do automatic removal of newly unused dependencies after the upgrade //Unattended-Upgrade::Remove-New-Unused-Dependencies "true"; // Do automatic removal of unused packages after the upgrade // (equivalent to apt-get autoremove) Unattended-Upgrade::Remove-Unused-Dependencies "true"; // Automatically reboot *WITHOUT CONFIRMATION* if // the file /var/run/reboot-required is found after the upgrade Unattended-Upgrade::Automatic-Reboot "true"; // Automatically reboot even if there are users currently logged in // when Unattended-Upgrade::Automatic-Reboot is set to true //Unattended-Upgrade::Automatic-Reboot-WithUsers "true"; // If automatic reboot is enabled and needed, reboot at the specific // time instead of immediately // Default: "now" Unattended-Upgrade::Automatic-Reboot-Time "02:00"; Unattended-Upgrade::Update-Days {5}; // Use apt bandwidth limit feature, this example limits the download // speed to 70kb/sec //Acquire::http::Dl-Limit "70"; // Enable logging to syslog. Default is False // Unattended-Upgrade::SyslogEnable "false"; // Specify syslog facility. Default is daemon // Unattended-Upgrade::SyslogFacility "daemon"; // Download and install upgrades only on AC power // (i.e. skip or gracefully stop updates on battery) // Unattended-Upgrade::OnlyOnACPower "true"; // Download and install upgrades only on non-metered connection // (i.e. skip or gracefully stop updates on a metered connection) // Unattended-Upgrade::Skip-Updates-On-Metered-Connections "true"; // Verbose logging // Unattended-Upgrade::Verbose "false"; // Print debugging information both in unattended-upgrades and // in unattended-upgrade-shutdown // Unattended-Upgrade::Debug "false"; ```

/etc/apt/apt.conf.d/20auto-upgrades

``` APT::Periodic::Update-Package-Lists "1"; APT::Periodic::Unattended-Upgrade "1"; APT::Periodic::AutocleanInterval "7"; APT::Periodic::CleanInterval "7"; ```

[ ] install mailer

root@data1:~# hostname > /etc/mailname 
root@data1:~# DEBIAN_FRONTEND=noninteractive apt-get install -y opensmtpd
root@data1:~# echo nick@kousu.ca >> ~root/.forward

test mailer:

in one terminal:

# journalctl -f -u opensmtpd

With the help of https://www.mail-tester.com/, in another:

root@data3:~# mail -s "testing outgoing" test-bxdanliiq@srv1.mail-tester.com
Cc: 
Hi there, will this go through?

opensmtpd logs say:

May 21 01:16:43 data3.praxis.kousu.ca smtpd[5954]: 84dff16a26522d0b smtp event=connected address=local host=data3.praxis.kousu.ca
May 21 01:16:43 data3.praxis.kousu.ca smtpd[5954]: 84dff16a26522d0b smtp event=message address=local host=data3.praxis.kousu.ca msgid=1eb206e1 from=<root@data3.praxis.kousu.ca> to=<test-bxdanliiq@srv1.mail-tester.com> size=471 ndest=1 proto=ESMTP
May 21 01:16:43 data3.praxis.kousu.ca smtpd[5954]: 84dff16a26522d0b smtp event=closed address=local host=data3.praxis.kousu.ca reason=quit
May 21 01:16:44 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=connecting address=smtp+tls://94.23.206.89:25 host=mail-tester.com
May 21 01:16:44 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=connected
May 21 01:16:45 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=starttls ciphers=version=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, bits=256
May 21 01:16:45 data3.praxis.kousu.ca smtpd[5954]: smtp-out: Server certificate verification succeeded on session 84dff16ec08d0171

May 21 01:16:46 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=delivery evpid=1eb206e1eff6a367 from=<root@data3.praxis.kousu.ca> to=<test-bxdanliiq@srv1.mail-tester.com> rcpt=<-> source="149.248.50.100" relay="94.23.206.89 (mail-tester.com)" delay=3s result="Ok" stat="250 2.0.0 Ok: queued as 3B00EA0237"
May 21 01:16:56 data3.praxis.kousu.ca smtpd[5954]: 84dff16ec08d0171 mta event=closed reason=quit messages=1

and it got to mail-tester.com; but mail-tester scored it low because the DNS needs work:

need to fix the reverse DNS at vultr
need to add an SPF record
need to add an MX record

[ ] install, uh, netdata?
[ ] letsencrypt
[ ] nginx
[ ] nextcloud
1. [ ] configure nextcloud using occ

mailer needs: sudo ufw allow 25/tcp. This must be an upgrade on Vultr's end since the last time I set up servers.

la la la

minutes

Alternatives:

zfs
- we're not the only ones to have this thought
- pros:
  - solid codebase
  - Copy-on-Write log-structured thing
- cons:
  - probably not good for data sharing; multiple users can't share a dataset? or if they can
  - requires connecting to the host server to set the version, using CLI tools

probably not good for data sharing; multiple users can't share a dataset? or if they can requires connecting to the host server to set the version, using CLI tools

I'm not sure I understand that. For example, in the context of NeuroPoly's internal data (which are currently versioned/distributed with git-annex), would it be considered "one user sharing a dataset"? And if so, would ZFS be limited for this specific use-case?

My (weak) understanding is that with zfs you have to do:

git commit ~= sudo zfs snapshot $ZFS_ROOT/$VERSION
git checkout $VERSION ~= sudo mount -t zfs $ZFS_ROOT/$VERSION $PATH

So actually, yes, a single zfs instance can be shared with multiple users, so long as everyone has a) direct ssh access to the data server b) sudo rights on that data server.

Alternately, an admin (i.e. you or me or Alex) could ssh in, mount a snapshot, and expose it more safely to users over afp://, smb://, nfs://, sshfs://. But then users need to be constantly coordinating with their sysadmins. Maybe that's okay for slow-moving datasets like the envisioned Praxis system but it would be pretty awkward for daily use here.

Although there is this: on Linux, you can do 'checkout' without mounting like this:

# cd $PATH/.zfs/snapshot/$VERSION

You still need sudo to make a commit; probably to make any change at all. But I guess we could make this work if the goal is just reproducibility?

It looks like we are going towards a centralized solution. In brief:

each participating site will curate their data into BIDS and send anonymized data to the centralized server
a mechanism will need to be put in place for participating sites to push data (and centralized server to review what is pushed).
a web interface (eg like a portal) will display the available datasets with instructions for download
a mechanism is needed to only allow people with certain permissions to download the data (eg TOKEN distributed alongside IRB/NDA approval?)
centralized server will run cron job to ensure BIDS compliance

Question is: where to host this centralized server

How about compute canada? the notably provide this virtualization solution. It is based on OpenStack.

It's Demo day:

We have been allocated an account on https://arbutus.cloud.computecanada.ca/. Docs at https://docs.computecanada.ca/wiki/Cloud_Quick_Start.

I'm going to install GIN on it: https://gin.g-node.org/G-Node/Info/wiki/In+House

Let's see how fast I can do this.

[x] auth: log in with username/pass at https://arbutus.cloud.computecanada.ca/
[x] auth: xclip -selection clipboard ~/.ssh/id_ed25519.neuropoly.pub and paste into https://arbutus.cloud.computecanada.ca/project/key_pairs -> Import Public Key; repeat for @jcohenadad's key from ansible
[x] Allocate VM: https://arbutus.cloud.computecanada.ca/project/instances/ -> " Launch Instance" ->
- Name = praxis-gin
- Availability Zone = persistent_01
- Source = Ubuntu-20.04.2-Focal-minimal-x64-2021-05, flavor = p1-1.5gb
- everything else at defaults
- [x] "Launch Instance"
[x] Networking: https://arbutus.cloud.computecanada.ca/project/instances/ -> praxis-gin -> Actions -> Associate Floating IP
- [x] + -> Manage Floating IP
  - [x] DNS Domain = praxis.kousu.ca, DNS Name = data1.praxis.kousu.ca (these are not long term names; just repurposing my demo)
  - [x] Allocate IP
  - "Error: Unable to allocate Floating IP. " :/
  - [x] DNS Domain = "", DNS Name = ""
  - [x] Allocate IP
  - 206.12.93.20 is allocated
- [x] "Associate"
- \
- [x] Firewall: https://arbutus.cloud.computecanada.ca/project/security_groups/
  - [x] Add Rule -> All ICMP -> Add
  - [x] Add Rule -> HTTP -> Add
  - [x] Add Rule -> HTTPS -> Add
  - [x] Add Rule -> SSH -> Add
  - [x] Add Rule -> SMTP -> Add
[x] Test connection

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly jcohen@206.12.93.20
The authenticity of host '206.12.93.20 (206.12.93.20)' can't be established.
ED25519 key fingerprint is SHA256:Z82G+UO/D3ZRJV53eQeaVt2rWSaVFmhLcEwbHO519Ig.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '206.12.93.20' (ED25519) to the list of known hosts.
jcohen@206.12.93.20: Permission denied (publickey).
[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly root@206.12.93.20
root@206.12.93.20: Permission denied (publickey).

hm okay what's wrong?

Okay docs say I should be using the username "ubuntu". That doesn't work either.

It seems like it just hung? I deleted and remade the instance with

Name = praxx Availability Zone = any (the docs said I shouldn't have changed this) Source = Ubuntu-20.04.2-Focal-x64-2021-05 flavor = p2-3gb everything else at defaults

I still can't get in though:

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly ubuntu@206.12.93.20
The authenticity of host '206.12.93.20 (206.12.93.20)' can't be established.
ED25519 key fingerprint is SHA256:nAE3NfUZ1R6uSdr3GUeuJPJ1gENAQdexM29r0EM8vxs.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '206.12.93.20' (ED25519) to the list of known hosts.
ubuntu@206.12.93.20: Permission denied (publickey).
[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa ubuntu@206.12.93.20
id_rsa         id_rsa.github  
[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa ubuntu@206.12.93.20
ubuntu@206.12.93.20: Permission denied (publickey).

Oh I see what the problem is:

Paste your public key (only RSA type SSH keys are currently supported).

drat.

But I...added my rsa key? And it's still not working?

[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa ubuntu@206.12.93.20
ubuntu@206.12.93.20: Permission denied (publickey).

hm.

The system log (which openstack will show you) says

[ 43.704728] cloud-init[1249]: ci-info: no authorized SSH keys fingerprints found for user ubuntu.

so, hm. Why?

Oh I missed this step:

Key Pair: From the Available list, select the SSH key pair you created earlier by clicking the upwards arrow on the far right of its row. If you do not have a key pair, you can create or import one from this window using the buttons at the top of the window (please see above). For more detailed information on managing and using key pairs see SSH Keys.

Delete and recreate with

Name = praxis-gin
Availability Zone = any (the docs said I shouldn't have changed this)
Source = Ubuntu-20.04.2-Focal-x64-2021-05
flavor = p2-3gb
keypair = nguenthe-requiem-rsa

It only allows you to init with a single keypair! Ah.

Got in:

[kousu@requiem ~]$ ssh-keygen -R 206.12.93.20
# Host 206.12.93.20 found: line 119
/home/kousu/.ssh/known_hosts updated.
Original contents retained as /home/kousu/.ssh/known_hosts.old
[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa ubuntu@206.12.93.20
The authenticity of host '206.12.93.20 (206.12.93.20)' can't be established.
ED25519 key fingerprint is SHA256:qJO/JofxCKeaGD71R5fxkGYlPBFAjfPOOPeeiWByqUc.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '206.12.93.20' (ED25519) to the list of known hosts.
Enter passphrase for key '/home/kousu/.ssh/id_rsa': 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-73-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 16:42:01 UTC 2021

  System load:  0.3               Processes:             123
  Usage of /:   6.5% of 19.21GB   Users logged in:       0
  Memory usage: 6%                IPv4 address for ens3: 192.168.233.67
  Swap usage:   0%

1 update can be applied immediately.
To see these additional updates run: apt list --upgradable

The list of available updates is more than a week old.
To check for new updates run: sudo apt update

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@praxis-gin:~$ sudo ls
ubuntu@praxis-gin:~$

[x] basic system updates:

ubuntu@praxis-gin:~$ sudo apt-get update && DEBIAN_FRONTEND=noninteractive sudo apt-get dist-upgrade -y

[x] auth (again):

[kousu@requiem ~]$ ssh root@joplin -- cat '~/.ssh/authorized_keys' | ssh ubuntu@206.12.93.20 -- sudo tee -a '/root
[kousu@requiem ~]$ ssh root@joplin -- cat '~/.ssh/authorized_keys' | ssh ubuntu@206.12.93.20 -- tee -a '~/.ssh/authorized_keys'

test: root@

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly root@206.12.93.20
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-73-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 16:49:02 UTC 2021

  System load:  0.05              Processes:             128
  Usage of /:   9.2% of 19.21GB   Users logged in:       1
  Memory usage: 10%               IPv4 address for ens3: 192.168.233.67
  Swap usage:   0%

0 updates can be applied immediately.

*** System restart required ***
Last login: Wed Jul  7 16:48:06 2021 from 104.163.172.27
root@praxis-gin:~# logout
Connection to 206.12.93.20 closed.

ubuntu@

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly ubuntu@206.12.93.20
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-73-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 16:49:08 UTC 2021

  System load:  0.04              Processes:             127
  Usage of /:   9.2% of 19.21GB   Users logged in:       1
  Memory usage: 10%               IPv4 address for ens3: 192.168.233.67
  Swap usage:   0%

0 updates can be applied immediately.

*** System restart required ***
Last login: Wed Jul  7 16:42:04 2021 from 104.163.172.27

[x] finish updates: ubuntu@praxis-gin:~$ sudo reboot
[x] Log back in
[x] Install docker, since that's how GIN is packaged:

ubuntu@praxis-gin:~$ sudo apt-get install docker.io
ubuntu@praxis-gin:~$ sudo systemctl enable --now docker
ubuntu@praxis-gin:~$ sudo usermod -a -G docker ubuntu # grant rights
ubuntu@praxis-gin:~$ logout
Connection to 206.12.93.20 closed.

Test:

[kousu@requiem ~]$ ssh -i ~/.ssh/id_rsa ubuntu@206.12.93.20
Enter passphrase for key '/home/kousu/.ssh/id_rsa': 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-77-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Jul  7 16:53:32 UTC 2021

  System load:  0.67               Processes:                119
  Usage of /:   11.2% of 19.21GB   Users logged in:          0
  Memory usage: 9%                 IPv4 address for docker0: 172.17.0.1
  Swap usage:   0%                 IPv4 address for ens3:    192.168.233.67

0 updates can be applied immediately.

Last login: Wed Jul  7 16:50:59 2021 from 104.163.172.27
ubuntu@praxis-gin:~$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

Start following https://gin.g-node.org/G-Node/Info/wiki/In+House:

[x] Install ubuntu@praxis-gin:~$ docker pull gnode/gin-web:live
[x] firewall again: GIN wants port 3000 and 2222 open, so: https://arbutus.cloud.computecanada.ca/project/security_groups -> Manage -> Add rules for 3000 and 2222 ingress
[x] Run it: ubuntu@praxis-gin:~$ docker run -p 3000:3000 -p 2222:22 -d gnode/gin-web:live (NOTE: small bug in the instructions: they tell you to install the :live version but then to run the ` version which in docker implies:latest`).
[x] Test it seems to be up:

the ports are listening:

ubuntu@praxis-gin:~$ sudo netstat -nlpt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:2222            0.0.0.0:*               LISTEN      3520/docker-proxy   
tcp        0      0 127.0.0.1:44435         0.0.0.0:*               LISTEN      1376/containerd     
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      536/systemd-resolve 
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      611/sshd: /usr/sbin 
tcp        0      0 0.0.0.0:3000            0.0.0.0:*               LISTEN      3507/docker-proxy   
tcp6       0      0 :::22                   :::*                    LISTEN      611/sshd: /usr/sbin

[kousu@requiem ~]$ ssh -p 2222 206.12.93.20 
The authenticity of host '[206.12.93.20]:2222 ([206.12.93.20]:2222)' can't be established.
ED25519 key fingerprint is SHA256:41ELnYTqwKKUzA9zMFSopXmi953gc+ZGco9f4vqvF3g.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[206.12.93.20]:2222' (ED25519) to the list of known hosts.
kousu@206.12.93.20: Permission denied (publickey,keyboard-interactive).

(I don't have a key inside of GIN yet, so of course this fails, but it's listening)

Insecurely run the setup process by visiting http://206.12.93.20:3000:

![Screenshot 2021-07-07 at 13-01-48 Gogs](https://user-images.githubusercontent.com/987487/124800335-86011780-df23-11eb-8121-7756b6f963b1.png)

I filled out the options like this:

![Screenshot 2021-07-07 at 13-11-03 Gogs](https://user-images.githubusercontent.com/987487/124801408-d036c880-df24-11eb-9ef8-d7ccc998d50d.png)

[x] Give it a DNS name by logging into to my personal DNS server (I don't have rights to dns://neuro.polymtl.ca) and mapping A data1.praxis.kousu.ca -> 206.12.93.20.

Verify:

[kousu@requiem ~]$ ping data1.praxis.kousu.ca
PING data1.praxis.kousu.ca (206.12.93.20) 56(84) bytes of data.
64 bytes from 206-12-93-20.cloud.computecanada.ca (206.12.93.20): icmp_seq=1 ttl=43 time=86.3 ms
64 bytes from 206-12-93-20.cloud.computecanada.ca (206.12.93.20): icmp_seq=2 ttl=43 time=78.6 ms
^C
--- data1.praxis.kousu.ca ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 78.619/82.471/86.324/3.852 ms

Change the hostname to match:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ hostname
praxis-gin
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo vi /etc/hostname 
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat /etc/hostname 
data1.praxis.kousu.ca
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo hostname $(cat /etc/hostname)
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ hostname $(cat /etc/hostname)
hostname: you must be root to change the host name
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ hostname
data1.praxis.kousu.ca

7.[ ] TLS Okay TLS is always a hoot. Let's see if I can do this in 10 minutes eh? I can front Gogs with an nginx reverse proxy

Actually I have this already, I can just copy the config out of https://github.com/neuropoly/computers/tree/master/ansible/roles/neuropoly-tls-server

ubuntu@praxis-gin:~$ sudo apt-get install nginx dehydrated

nginx config:

ubuntu@praxis-gin:~$ sudo vi /etc/nginx/sites-available/acme
ubuntu@praxis-gin:~$ cat /etc/nginx/sites-available/acme

server {
    listen      80 default_server;
    listen [::]:80 default_server;

    server_name _;

    # This glues together using both a reverse-proxy over to the dev server, while still letting ACME work
    # https://serverfault.com/questions/768509/lets-encrypt-with-an-nginx-reverse-proxy
    # Notice: this server { } listens to *all* hostnames, so any DNS record pointed at this box can be issued a ACME cert
    location ^~ /.well-known/acme-challenge {
        alias /var/lib/dehydrated/acme-challenges;
    }

    # enforce https
    # so long as this is the only `server{}` run on port 80, all http connections get rewritten to https ones.
    # ($host is pulled from the client's request, along with $request_uri, so this line works for *any* virtual host we care to make)
    location / {
        # 307 is a temporary redirect, to avoid causing bugs due to browser caching while developing this ability
        # but 301 would be more efficient in the long term
        return 307 https://$host$request_uri;
    }
}

server {
    # this is a copy of what's in "snippets/ssl.conf", but without claiming 'default_server'
    # it is necessary in order to auto-verify the SSL config after deploying certificates.

    listen 443 ssl;
    listen [::]:443 ssl;

    include "snippets/_ssl.conf";
}

ubuntu@praxis-gin:~$ sudo vi /etc/nginx/snippets/_ssl.conf
ubuntu@praxis-gin:~$ cat /etc/nginx/snippets/_ssl.conf
ssl_certificate /etc/ssl/acme/data1.praxis.kousu.ca/fullchain.pem;
ssl_certificate_key /etc/ssl/acme/data1.praxis.kousu.ca/privkey.pem;

gzip off; # anti-BREACH: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773332

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers "HIGH:!aNULL"; # OpenBSD's recommendation: https://man.openbsd.org/httpd.conf
ssl_prefer_server_ciphers on;

ubuntu@praxis-gin:~$ cd /etc/nginx/sites-enabled/
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo ln -s ../sites-available/acme 
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo rm default 
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ ls -l 
total 0
lrwxrwxrwx 1 root root 23 Jul  7 17:32 acme -> ../sites-available/acme

dehydrated config:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ hostname | sudo tee /etc/dehydrated/domains.txt
data1.praxis.kousu.ca

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat /etc/dehydrated/conf.d/neuropoly.sh
AUTO_CLEANUP=yes

# TODO: set this to the sysadmin mailing list: https://github.com/neuropoly/computers/issues/39
CONTACT_EMAIL=neuropoly@googlegroups.com

CERTDIR=/etc/ssl/acme

# it would be nice to use the default more efficient ECDSA keys
#KEY_ALGO=secp384r1
# but netdata is incompatible with them
KEY_ALGO=rsa

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat acme 

server {
    listen      80 default_server;
    listen [::]:80 default_server;

    server_name _;

    # This glues together using both a reverse-proxy over to the dev server, while still letting ACME work
    # https://serverfault.com/questions/768509/lets-encrypt-with-an-nginx-reverse-proxy
    # Notice: this server { } listens to *all* hostnames, so any DNS record pointed at this box can be issued a ACME cert
    location ^~ /.well-known/acme-challenge {
        alias /var/lib/dehydrated/acme-challenges;
    }

    # enforce https
    # so long as this is the only `server{}` run on port 80, all http connections get rewritten to https ones.
    # ($host is pulled from the client's request, along with $request_uri, so this line works for *any* virtual host we care to make)
    location / {
        # 307 is a temporary redirect, to avoid causing bugs due to browser caching while developing this ability
        # but 301 would be more efficient in the long term
        return 307 https://$host$request_uri;
    }
}

server {
    # this is a copy of what's in "snippets/ssl.conf", but without claiming 'default_server'
    # it is necessary in order to auto-verify the SSL config after deploying certificates.

    listen 443 ssl;
    listen [::]:443 ssl;

    include "snippets/_ssl.conf";
}
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat ../snippets/_ssl.conf 
ssl_certificate /etc/ssl/acme/data1.praxis.kousu.ca/fullchain.pem;
ssl_certificate_key /etc/ssl/acme/data1.praxis.kousu.ca/privkey.pem;

gzip off; # anti-BREACH: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773332

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers "HIGH:!aNULL"; # OpenBSD's recommendation: https://man.openbsd.org/httpd.conf
ssl_prefer_server_ciphers on;
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ sudo systemctl restart nginx

Verify:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ curl -v https://data1.praxis.kousu.ca
*   Trying 206.12.93.20:443...
* TCP_NODELAY set
* Connected to data1.praxis.kousu.ca (206.12.93.20) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=data1.praxis.kousu.ca
*  start date: Jul  7 16:40:47 2021 GMT
*  expire date: Oct  5 16:40:46 2021 GMT
*  subjectAltName: host "data1.praxis.kousu.ca" matched cert's "data1.praxis.kousu.ca"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
> GET / HTTP/1.1
> Host: data1.praxis.kousu.ca
> User-Agent: curl/7.68.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.18.0 (Ubuntu)
< Date: Wed, 07 Jul 2021 17:43:21 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 21 Apr 2020 14:09:01 GMT
< Connection: keep-alive
< ETag: "5e9efe7d-264"
< Accept-Ranges: bytes
< 
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host data1.praxis.kousu.ca left intact

One thing not in ansible is the gogs reverse proxy part:

ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat gogs 
server {

    server_name _;

    listen 443 ssl;
    listen [::]:443 ssl;

    include "snippets/_ssl.conf";

    location / {
            proxy_set_header X-Real-IP $remote_addr;
            proxy_pass http://127.0.0.1:3000/;
    }
}
ubuntu@praxis-gin:/etc/nginx/sites-enabled$ cat acme 

server {
    listen      80 default_server;
    listen [::]:80 default_server;

    server_name _;

    # This glues together using both a reverse-proxy over to the dev server, while still letting ACME work
    # https://serverfault.com/questions/768509/lets-encrypt-with-an-nginx-reverse-proxy
    # Notice: this server { } listens to *all* hostnames, so any DNS record pointed at this box can be issued a ACME cert
    location ^~ /.well-known/acme-challenge {
        alias /var/lib/dehydrated/acme-challenges;
    }

    # enforce https
    # so long as this is the only `server{}` run on port 80, all http connections get rewritten to https ones.
    # ($host is pulled from the client's request, along with $request_uri, so this line works for *any* virtual host we care to make)
    location / {
        # 307 is a temporary redirect, to avoid causing bugs due to browser caching while developing this ability
        # but 301 would be more efficient in the long term
        return 307 https://$host$request_uri;
    }
}

server {
    # this is a copy of what's in "snippets/ssl.conf", but without claiming 'default_server'
    # it is necessary in order to auto-verify the SSL config after deploying certificates.

    #listen 443 ssl;
    #listen [::]:443 ssl;

    include "snippets/_ssl.conf";
}

NOTE: I disabled ssl in '/etc/nginx/sites-enabled/acme' because it was conflicting with gogs?? I don't know what's up with that. Gotta think through that more. Maybe ansible needs another patch.

It's working now: ``` ubuntu@praxis-gin:/etc/nginx/sites-enabled$ curl -v https://data1.praxis.kousu.ca * Trying 206.12.93.20:443... * TCP_NODELAY set * Connected to data1.praxis.kousu.ca (206.12.93.20) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 * ALPN, server accepted to use http/1.1 * Server certificate: * subject: CN=data1.praxis.kousu.ca * start date: Jul 7 16:40:47 2021 GMT * expire date: Oct 5 16:40:46 2021 GMT * subjectAltName: host "data1.praxis.kousu.ca" matched cert's "data1.praxis.kousu.ca" * issuer: C=US; O=Let's Encrypt; CN=R3 * SSL certificate verify ok. > GET / HTTP/1.1 > Host: data1.praxis.kousu.ca > User-Agent: curl/7.68.0 > Accept: */* > * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): * old SSL session ID is stale, removing * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Server: nginx/1.18.0 (Ubuntu) < Date: Wed, 07 Jul 2021 17:46:44 GMT < Content-Type: text/html; charset=UTF-8 < Transfer-Encoding: chunked < Connection: keep-alive < Set-Cookie: lang=en-US; Path=/; Max-Age=2147483647 < Set-Cookie: i_like_gogs=4956b07a72ea926c; Path=/; HttpOnly < Set-Cookie: _csrf=hPHf3Z_mpseJYpknbvtGHoPMh506MTYyNTY4MDAwNDI4Mjk0OTAzMQ; Path=/; Expires=Thu, 08 Jul 2021 17:46:44 GMT < Gogs

Modern Research Data Management for Neuroscience

...distributed version control, flavoured for science

Upload your data to private repositories.
Synchronise across devices.
Securely access your data from anywhere.

Collaborate with colleagues.
Make your data public.
Make your data citable with the GIN DOI service.

Uploaded files are automatically versioned.
Retrieve any previously uploaded version of a file.
Never lose file history.

Based on open source projects such as Git, git-annex, and Gogs. You can even set it up in your lab!

* Connection #0 to host data1.praxis.kousu.ca left intact ```

And check the user's view (notice the TLS icon is there)

2021-07-07-134816_1527x620_scrot

[x] Disable the port 3000 firewall rule in https://arbutus.cloud.computecanada.ca/project/security_groups/

2021-07-07-135436_1668x205_scrot

[ ] Figure out uploading via git. Gogs is running ssh on port 2222, which is..weird. But let's see if I can sort that out.
- go to https://data1.praxis.kousu.ca/user/settings/ssh, add the same ssh keys I did to the base server

[kousu@requiem ~]$ ssh -i ~/.ssh/id_ed25519.neuropoly -p 2222 git@data1.praxis.kousu.ca
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
PTY allocation request failed on channel 0
Hi there, You've successfully authenticated, but GIN does not provide shell access.
Connection to data1.praxis.kousu.ca closed.

GREAT. Now can I make this permanent?

[kousu@requiem ~]$ vi ~/.ssh/config
[kousu@requiem ~]$ tail -n 6 ~/.ssh/config

Host match *.praxis.kousu.ca
User git
Port 2222
IdentityFile ~/.ssh/id_ed25519.neuropoly

[kousu@requiem ~]$ ssh data1.praxis.kousu.ca
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
PTY allocation request failed on channel 0
Hi there, You've successfully authenticated, but GIN does not provide shell access.
Connection to data1.praxis.kousu.ca closed.

Awesome. Okay, can I use git with this?

Lets see if I can mirror our public dataset.

First, download it to my laptop (but not all of it, it's still pretty large; I ^C'd out of it):

``` [kousu@requiem datalad]$ git clone https://github.com/spine-generic/data-single-subject/ Cloning into 'data-single-subject'... remote: Enumerating objects: 2360, done. remote: Counting objects: 100% (2360/2360), done. remote: Compressing objects: 100% (1209/1209), done. remote: Total 2360 (delta 857), reused 2147 (delta 680), pack-reused 0 Receiving objects: 100% (2360/2360), 263.57 KiB | 1.68 MiB/s, done. Resolving deltas: 100% (857/857), done. [kousu@requiem datalad]$ cd data bash: cd: data: No such file or directory [kousu@requiem datalad]$ cd data-single-subject/ [kousu@requiem data-single-subject]$ git annex get . (scanning for unlocked files...) get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T1w.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T2star.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T2w.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (from amazon...) (checksum...) ok get sub-douglas/anat/sub-douglas_T1w.nii.gz (from amazon...) (checksum...) ok get sub-douglas/anat/sub-douglas_T2star.nii.gz (from amazon...) (checksum...) ok get sub-douglas/anat/sub-douglas_T2w.nii.gz (from amazon...) (checksum...) ok get sub-douglas/dwi/sub-douglas_dwi.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_T1w.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_T2star.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_T2w.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-glen/dwi/sub-glen_dwi.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (from amazon...) (checksum...) ok get sub-mgh/anat/sub-mgh_T1w.nii.gz (from amazon...) (checksum...) ok get sub-mgh/anat/sub-mgh_T2star.nii.gz (from amazon...) ^C ```

Okay, now, make a repo on the new server: https://data1.praxis.kousu.ca/repo/create ->

Screenshot 2021-07-07 at 14-02-05 Gogs

Oh here's a bug; drat; I wonder if I can change the hostname gogs knows for itself, or if I need to rebuild it:

Screenshot 2021-07-07 at 14-02-34 jcohen data-single-subject

Screenshot 2021-07-07 at 14-03-07 jcohen data-single-subject

But if I swap in the right URL, and deal with git-annex being awkward, it works:

[kousu@requiem data-single-subject]$ git remote add praxis git@data1.praxis.kousu.ca:/jcohen/data-single-subject.git
[kousu@requiem data-single-subject]$ git push praxis master
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Enumerating objects: 708, done.
Counting objects: 100% (708/708), done.
Delta compression using up to 4 threads
Compressing objects: 100% (407/407), done.
Writing objects: 100% (708/708), 142.68 KiB | 142.68 MiB/s, done.
Total 708 (delta 271), reused 708 (delta 271), pack-reused 0
remote: Resolving deltas: 100% (271/271), done.
To data1.praxis.kousu.ca:/jcohen/data-single-subject.git
 * [new branch]      master -> master
[kousu@requiem data-single-subject]$ git annex copy --to=praxis
(recording state in git...)
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
git-annex: cannot determine uuid for praxis (perhaps you need to run "git annex sync"?)
[kousu@requiem data-single-subject]$ git annex sync
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
commit 
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
ok
pull praxis 
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (2/2), 138 bytes | 138.00 KiB/s, done.
From data1.praxis.kousu.ca:/jcohen/data-single-subject
 * [new branch]      git-annex  -> praxis/git-annex
ok
pull origin 
ok
(merging praxis/git-annex into git-annex...)
(recording state in git...)
push praxis 
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Enumerating objects: 1852, done.
Counting objects: 100% (1852/1852), done.
Delta compression using up to 4 threads
Compressing objects: 100% (760/760), done.
Writing objects: 100% (1851/1851), 126.85 KiB | 15.86 MiB/s, done.
Total 1851 (delta 768), reused 1518 (delta 586), pack-reused 0
remote: Resolving deltas: 100% (768/768), done.
To data1.praxis.kousu.ca:/jcohen/data-single-subject.git
 * [new branch]      git-annex -> synced/git-annex
 * [new branch]      master -> synced/master
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
ok
push origin 
Username for 'https://github.com': ^C
[kousu@requiem data-single-subject]$ git annex copy --to=praxis
copy derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': 
(to praxis...) 
ok                                
copy derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (to praxis...) 
ok                                
copy derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (to praxis...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_T1w.nii.gz (to praxis...) 
ok                                   
copy sub-chiba750/anat/sub-chiba750_T2star.nii.gz (to praxis...) 
ok                                 
copy sub-chiba750/anat/sub-chiba750_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (to praxis...) 
ok                                  
copy sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (to praxis...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (to praxis...) 
ok                                
copy sub-douglas/anat/sub-douglas_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-douglas/anat/sub-douglas_T2star.nii.gz (to praxis...) 
ok                                
copy sub-douglas/anat/sub-douglas_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-douglas/dwi/sub-douglas_dwi.nii.gz (to praxis...) 
ok                                 
copy sub-glen/anat/sub-glen_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-glen/anat/sub-glen_T2star.nii.gz (to praxis...) 
ok                                
copy sub-glen/anat/sub-glen_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-glen/dwi/sub-glen_dwi.nii.gz (to praxis...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (to praxis...) 
ok                                   
copy sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (to praxis...) 
ok                                 
copy sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (to praxis...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (to praxis...) 
ok                                 
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (to praxis...) 
ok                                
copy sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (to praxis...) 
ok                                
copy sub-mgh/anat/sub-mgh_T1w.nii.gz (to praxis...) 
ok                                 
(recording state in git...)

the port 2222 problem

Using a non-standard ssh port is a problem. I know of five solutions:

1.

export GIT_SSH_COMMAND="ssh -p 2222"

or

GIT_SSH_COMMAND="ssh -p 2222" git <subcommand> ....

Each user uses this each time they use the server.

ssh_config

Each users adds this one-time to each new machine, at the same time as they provide their ssh key.

cat >> ~/.ssh/config <<EOF
Host match *.praxis.kousu.ca
Port 2222
EOF

Use a separate IP address

ComputeCanada lets us allocate multiple IP addresses per machine. The inscription form asks if you want 1 or 2. If we had a second IP address, we could bind one of them to the underlying OS and the other to GIN.

Here's someone claiming to do this with Gitlab+docker: https://serverfault.com/a/951985

Put the OS sshd on a different port

Just swap the two ports:

edit /etc/ssh/sshd_conf to set Port 2222
change the docker run line to docker run -p 3000:3000 -p 22:22 -d gnode/gin-web:live

Then the sysadmins need to know to use

ssh -p 2222 ubuntu@data1.praxis.kousu.ca

when they need to log in to fix something. That will hopefully be pretty rare, though. They could even do this:

cat >> ~/.ssh/config <<EOF
Host sysadmin-data1.praxis
HostName data1.praxis,kousu.ca
Port 2222
EOF

And users don't need to do anything special.

Install GIN without docker

The docker image comes with a built in ssh server. If we install GIN on the base system and share the system ssh there won't be a second port to worry about.

This is more work because it requires rebuilding their package in a non-docker way. It's my preference though. I would like to build a .deb so you can "apt-get install gin" and have everything Just Work.

We could also make this package deploy dehydrated and nginx as above, to save even more time for the users.

Gogs has a semi-official workaround at http://www.ateijelo.com/blog/2016/07/09/share-port-22-between-docker-gogs-ssh-and-local-system

Demo day went well by the way. https://praxisinstitute.org/ seems happy to investigate this direction.

Shortcuts taken that should be corrected:

I didn't deploy the dehydrated cronjobs to renew certs
I didn't deploy unattended-upgrades to keep the system up to date
I haven't tested it docker will survive a reboot
I told it the wrong hostname (it knows itself as localhost and by its IP address, not by data1.praxis.kousu.ca)
It has no email server, so it can't send password resets

All of these could be fixed quick by bringing this server under ansible, but I wrote into the ansible scripts the assumption that all servers are under *.neuro.polymtl.ca, so I'd need to fix that first.

Also

there are no backups

I've been working on adding this to the lab's configuration management today (for those who have access, that's at https://github.com/neuropoly/computers/pull/227). To that end, I'm re-purposing the resources allocated to praxis-gin to be for vaughan-test.neuropoly.org, which will be our dev server for data.neuropoly.org.

And https://data.neuropoly.org will be just as good a demo for Praxis the next time we talk with them. And with the ansible work you're doing it will even be reproducible for them to build their own https://data.praxisinstitute.org :)

Some competitors:

Possible collaborators:

https://portagenetwork.ca/

Portals (where we could potentially get ourselves listed, especially if we help them out by making sure we have APIs available):

Part of our promise of standards-compliant security was to run fail2ban but maybe pam_faillock is an easier choice (https://github.com/neuropoly/computers/issues/168#issuecomment-1008239662). But perhaps not.

We had a meeting with Praxis today:

The research centers in Vancouver, Toronto and Calgary have gotten ethics approval to upload to the server we're making
The other 4 research centers are bogged down in ethics
For a minimum-viable product, we'll have a single server and we'll use Gitea Organizations to do inter/intra-research center access control
We're aiming to have something in production by March.

Some other related work that came up:

FHIR is a spec out of europe for medical data; it's comparable to git-annex + BIDS as an ensemble.

@taowa is going to contact ComputeCanada asking them to extend our allocation on https://docs.computecanada.ca/wiki/Cloud_resources#Arbutus_cloud_.28arbutus.cloud.computecanada.ca.29 from 2 VPSes to 3 -- one for data-test.neuropoly.org, one for data.neuropoly.org, and one for data.praxisinstitute.org.

We've done a lot of work in our private repo at https://github.com/neuropoly/computers/issues/167 (the praxis-specific part is at https://github.com/neuropoly/computers/pull/332) on this. We've got an ansible deployment, a fork of gitea (https://github.com/neuropoly/gitea/pull/1/); and have a demo server at https://data.praxisinstitute.org.dev.neuropoly.org/. Eventually we will want to extract those ansible scripts and publish them on Galaxy

Today we talked to Praxis and David Cadotte again and got an update on how their data negotiations are going:

Ottawa (n = 11); curator: ?; Data Sharing Agreement* Signed; data BIDS-ified
- in a week or two will have a dataset we can trial uploading; meaning we need to have data.praxisinstitute.org live and in production by then
Halifax (n = 20); curator: ?; ; data BIDS-ified?
Calgary (n = 9); curator: ?; data BIDS-ified?
Montreal (n = 68); curator: ?
Vancouver: waiting on ethics approval
Hamilton (n > 150): waiting on ethics approval
Quebec City: not yet started

Each site is very different and needs help adapting to their environment; they have different PACS, different OSes, different levels of familiarity with the command line. David has been spending time giving tech support to some of the sites' curators to help get their data in BIDS format. We have created a wiki here to gather the information David has been teaching and anything we learn during our trial dataset uploads; it's here on GitHub but could migrated to https://data.praxisinstitute.org, once that's live (and eventually perhaps these docs could even be rolled into the ansible deployment, as a standard part of Neurogitea?).

We will be in touch with Praxis's IT team in the next couple weeks so we can migrate https://data.praxisinstitute.org.dev.neuropoly.org -> https://data.praxisinstitute.org.

We got some branding feedback from Praxis Institute for the soon-to-be https://data.praxisinstitute.org:

We have received some feedback from our director of marketing, and she really liked the website header colors (no need to change the text). She did provide couple of suggestions:

The logo resolution is appearing quite low on the website header (compared to the tagline text), could you please adjust it to look more as in the svg logo file?

Is it possible to use a word mark for Neurogitea? Something similar to the one attached would be great!

For paragraph text, will it be possible to have a white or very light background with dark text colour (black, dark grey, etc.)?

(Note that the current demo simply uses the arc-green theme which comes bundled with Gitea, as described in this section of the Gitea customization docs.)

[x] For point 1, it was pixelated because we had been using an earlier PNG version of the logo, but we have a nice SVG version now. I'm attaching two slightly modified versions of the logo here:
- praxis.svg is a version with black text, suitable for use on a light background (for example, the big main logo)
- praxis-rev.svg is a version with white text, suitable for use on a dark background (for example, in the page header if that stays dark).
[x] For point 2, I managed to get an SVG version of the wordmark (that is, the word "Neurogitea" but in a fancy font) which doesn't depend on specific fonts being installed on the viewer's computer: wordmark.svg
[x] For point 3, we should be able to tweak arc-green (source link) into a new theme for Gitea to use. I'll note some colours used by the Praxis Institute website:
- dark header background: #161616
- light body background: #fefefe
- wordmark text: #000000
- paragraph text: #646464
- blue logo: #00bed6

You extracted wordmark.svg! Niiice.

On my end, I emailed R. Foley at Praxis to ask to get DNS reassigned to our existing ComputeCanada instances, so that we will have https://data.praxisinstitute.org in place of https://data.praxisinstitute.org.dev.neuropoly.org.

EDIT: R. Foley got back to say that they don't want to give us praxisinstitute.org, but will talk to their marketing team and decide on an appropriate domain we can have.

On the demo server I just saw this probe from Mongolia:

180.149.125.168 - - [07/May/2022:19:21:42 -0400] "GET /c/ HTTP/1.1" 307 180 "-" "Mozilla/5.0 (Windows NT 5.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"

so it occurs to me that maybe we should impose geoblocking on Praxis's server. It's meant to be a pan-Canada project, so maybe we should impose firewall rules that actually enforce that it's only pan-Canadian. That's a bit tricky to do; I guess we can extract ip blocks from MaxMind's geoip-database and feed them into iptables?

* [ ]  For point 3, we should be able to tweak `arc-green` ([source link](https://github.com/go-gitea/gitea/blob/main/web_src/less/themes/theme-arc-green.less)) into a new theme for Gitea to use. I'll note some colours used by [the Praxis Institute website](https://praxisinstitute.org/):

  * dark header background: #161616
  * light body background: #fefefe
  * wordmark text: #000000
  * paragraph text: #646464
  * blue logo: #00bed6

theme-praxis.css:

:root {
  --color-body: #fefefe;
  --color-text: #646464;
  --color-primary: #00bed6; /* blue of the logo */
  --color-primary-dark-1: #00a6bb;
  --color-secondary: #b4008d; /* purple used to give some visual contrast */
  --color-secondary-dark-1: #8a016c;
  --color-warning-bg: #ffb600; /* yellow used as a tertiary colour */
  --color-warning-bg-dark-1: #D99C03;

  --color-menu: var(--color-body);
}

.following.bar.light {
  /* nav bar is dark themed, in contrast to the rest of the site */
  --color-body: #161616;
  --color-text: #bbc0ca;
}

.following.bar.light .dropdown .menu {
  /* but dropdowns within the navbar are back to light themed */
  --color-body: #fefefe;
  --color-text: #646464;
}

.ui.basic.green.buttons .button:hover, .ui.basic.green.button {
  color: var(--color-primary);
}

.ui.basic.green.buttons .button:hover, .ui.basic.green.button:hover {
  color: var(--color-primary-dark-1);
}

.ui.green.buttons .button, .ui.green.button {
  background-color: var(--color-primary);
}

.ui.green.buttons .button, .ui.green.button:hover {
  background-color: var(--color-primary-dark-1);
}

.ui.red.buttons .button, .ui.red.button {
  background-color: var(--color-warning-bg);
}

.ui.red.buttons .button, .ui.red.button:hover {
  background-color: var(--color-warning-bg-dark-1);
}

Looks like:

Screenshot 2022-05-24 at 12-04-28 Gitea Git with a cup of tea Screenshot 2022-05-24 at 12-03-44 Gitea Git with a cup of tea

I went one step slightly further than the default themes and themed the yes/no buttons as blue/yellow (matching colours I got off https://praxisinstitute.org/) instead of the default green/red.

I'll integrate this into ansible this afternoon.

After that I'll replace the logos.

* [ ]  For point 1, it was pixelated because we had been using an earlier PNG version of the logo, but we have a nice SVG version now. I'm attaching two slightly modified versions of the logo here:

  * [praxis.svg](https://user-images.githubusercontent.com/928742/167062346-6c35192c-fd87-4052-8456-b005fd8dbab1.svg) is a version with black text, suitable for use on a light background (for example, the big main logo)

I followed the instructions:

make generate-images

``` git clone https://github.com/go-gitea/gitea cd gitea cp praxis.svg assets/logo.svg make generate-images # wait a while mkdir -p mkdir custom/public/img cp public/img/{apple-touch-icon.png,avatar_default.png,{favicon,logo}.{png,svg}} custom/public/img # or somewhere else to stage it ```

But it came out badly:

Screenshot 2022-05-24 at 22-06-10 Gitea Git with a cup of tea

After mulling for a while, I opened it up in Inkscape and saw the problem: the viewport is bizarrely too large. It's set to

viewBox="0 0 1520 1230"

With Inkscape helping me measure things, I worked out that the tight viewport is

viewBox="300 297 935 663"

So here's that file: praxis.svg

you can see the difference here

![Screenshot_20220524_225602](https://user-images.githubusercontent.com/987487/170170310-998bbc4a-e5fa-4081-a9ef-1d710bb83dc4.png) ![Screenshot_20220524_225612](https://user-images.githubusercontent.com/987487/170170311-ee739c06-22a8-4db3-a12d-d6084f182433.png)

With this, it's a lot better:

Screenshot 2022-05-24 at 23-00-06 Gitea Git with a cup of tea

  * [praxis-rev.svg](https://user-images.githubusercontent.com/928742/167062353-93ce125e-5aac-48b7-80f7-e30ac9494946.svg) is a version with white text, suitable for use on a dark background (for example, in the page header if that stays dark).

Thanks, but I think I'm going to end up skipping praxis-rev.svg. For one thing, I'd prefer to control its colours via CSS in the theme file (which is a thing you can do with svgs) for another, the logo is really way too small with the text attached for the navbar, so I'm just going to cut it off and leave the blue butterfly-spine, which isn't light/dark sensitive.

And here's that file: logo.svg

With this, the navbar looks better:

Screenshot 2022-05-24 at 23-13-29 Gitea Git with a cup of tea

but now the cover page is missing the title, because the cover logo and the navbar logo are the same file, so I need to separate the two and customize the cover page to know that. I have to anyway to handle the wordmark part.

I put

cp praxis.svg custom/public/img/logo-home.svg

$ find . -name home.tmpl 
./templates/org/home.tmpl
./templates/home.tmpl
./templates/repo/home.tmpl
$ mkdir -p custom/templates
$ cp templates/home.tmpl custom/templates/
$ vi custom/templates/home.tmpl

And made this:

{{template "base/head" .}}
<div class="page-content home">
    <div class="ui stackable middle very relaxed page grid">
        <div class="sixteen wide center aligned centered column">
            <div>
                <img class="logo" width="220" height="220" src="{{AssetUrlPrefix}}/img/logo-home.svg"/>
            </div>
            <div class="hero">
                <h1 class="ui icon header title">
                    {{AppName}}
                </h1>
                <h2>{{.i18n.Tr "startpage.app_desc"}}</h2>
            </div>
        </div>
    </div>
</div>
{{template "base/footer" .}}

diff

```diff --- templates/home.tmpl 2021-12-17 11:13:14.466947322 -0500 +++ custom/templates/home.tmpl 2022-05-24 23:20:53.533449377 -0400 @@ -3,7 +3,7 @@

-

+

-

- -

- {{.i18n.Tr "startpage.install_desc" | Str2html}} -

-

- -

- {{.i18n.Tr "startpage.platform_desc" | Str2html}} -

-

- -

- {{.i18n.Tr "startpage.lightweight_desc" | Str2html}} -

-

- -

- {{.i18n.Tr "startpage.license_desc" | Str2html}} -

-

* [ ]  For point 2, I managed to get an SVG version of the wordmark (that is, the word "Neurogitea" but in a fancy font) which doesn't depend on specific fonts being installed on the viewer's computer: [wordmark.svg](https://user-images.githubusercontent.com/928742/167062360-18c799d9-4da6-4911-bfa6-a064f7e115a3.svg)

For this, I put

cp wordmark.svg custom/public/img/neurogitea-wordmark.svg

And did this patch to what I had above:

diff --git a/custom/public/css/theme-praxis.css b/custom/public/css/theme-praxis.css
index a1665744f..d8cf3faea 100644
--- a/custom/public/css/theme-praxis.css
+++ b/custom/public/css/theme-praxis.css
@@ -47,3 +47,9 @@
 .ui.red.buttons .button, .ui.red.button:hover {
   background-color: var(--color-warning-bg-dark-1);
 }
+
+/* the neurogitea wordmark needs some CSS resets to display properly */
+.ui.header > img.logo {
+  max-width: none;
+  width: 500px;
+}
diff --git a/custom/templates/home.tmpl b/custom/templates/home.tmpl
index d7d1d8501..2aaaa176b 100644
--- a/custom/templates/home.tmpl
+++ b/custom/templates/home.tmpl
@@ -7,7 +7,7 @@
                        </div>
                        <div class="hero">
                                <h1 class="ui icon header title">
-                                       {{AppName}}
+                                       <img class="logo" src="{{AssetUrlPrefix}}/img/neurogitea-wordmark.svg"/>
                                </h1>
                                <h2>{{.i18n.Tr "startpage.app_desc"}}</h2>
                        </div>

And now I've got

Screenshot 2022-05-24 at 23-37-11 Gitea Git with a cup of tea

Some other praxis-specific things to include:

# app.ini
APP_NAME = Neurogitea

[ui]
THEMES = praxis
DEFAULT_THEME = praxis

[ui.meta]
AUTHOR = "Praxis Spinal Cord Institute"
DESCRIPTION = "Neurogitea connects spinal cord researchers with each others' data"
KEYWORDS = "bids,data sharing,git-annex,datalad,git-lfs,reproducible science" # ?

Theming is sitting on https://github.com/neuropoly/computers/pull/332/commits/b365da08c69c67509bbcdcbffe3348cda521cfd0 (sorry it's in the private repo; extracting and publishing to Ansible Galaxy will be a Real-Soon-Now goal)

On my end, I emailed R. Foley at Praxis to ask to get DNS reassigned to our existing ComputeCanada instances, so that we will have https://data.praxisinstitute.org in place of https://data.praxisinstitute.org.dev.neuropoly.org.

EDIT: R. Foley got back to say that they don't want to give us praxisinstitute.org, but will talk to their marketing team and decide on an appropriate domain we can have.

They've made a decision: spineimage.ca. I've asked them to assign

      spineimage.ca   206.12.97.250
drone.spineimage.ca   206.12.93.20

When that's done, I'll add those domains in https://github.com/neuropoly/computers/pull/332; and then we should maybe think about pointing data.praxisinstitute.org.dev.neuropoly.org back at some servers on Amazon again to host a staging server we can use without knocking out their prod server.

Meeting - site_03

We had a meeting today with Praxis, including the first trial data curator.

David Cadotte had helped her already curate the dataset into BIDS. We successfully uploaded it to https://spineimage.ca/TOH/site_03.

User Documentation

David Cadotte has a draft curator tutorial :lock_with_ink_pen: . I started the same document on the wiki here but his is further along.

The next step is that David, me and the trial curator are going to upload a trial dataset to https://data.praxisinstitute.org.dev.neuropoly.org/ together. We will be picking a meeting time via Doodle soon.

The curator has been using bids-validator, but it sounded like they were using the python version, not the javascript one. The javascript one is incomplete but the python version is even more incomplete. This is something I should check on when we get together to upload the dataset.

Prod Plan: https://spinalimage.ca

In parallel, we will finish up migrating to spineimage.ca, the "prod" site, and sometime next month we should have 4 - 5 curators ready.

Future dev plan: https://spineimage.ca.dev.neuropoly.org

We'll have to repurpose the existing VMs to become prod. But I would like to keep the staging site so we can have something to experiment on. I could experiment locally but I don't have an easy way to turn off or mock https, so it's simpler just to have a mock server with a real cert from LetsEncrypt. I'll rename it spinalimage.ca.dev.neuropoly.org

But there's a problem: ComputeCanada gave has given us three VMs but only two public IPs; the current version of neurogitea needs two IPs.

Some ideas:

ask ComputeCanada for another VM and two more IPs
- con: they are probably getting tired of us asking for more resources
move the staging server to DigitalOcean or Amazon
- con: if we size the instances to be even somewhat realistic this is very expensive (~100$/mth)
pay for a small VM on DigitalOcean or Amazon, set up a reverse tunnel from the third VM on ComputeCanada using autossh (or maybe even wireguard?)
- con: added sysadmin complexity
merge https://github.com/neuropoly/computers/pull/321
- con: this goes against Drone's advice to not run everything on a single server; though it works, I've tested it.
finish @mguaypaq's work on replacing Drone with an inline bids-validator-only gitea plugin
- con: I don't want to rush @mguaypaq

Summary for today: we were able to connect with Lisa J. from the site_012, and had some pretty good success.

We managed to install the required dependencies for curation on Windows, and updated the shared Google doc with the steps taken.
We got dcm2bids to run (including dcm2niix), but we suspect the configuration file isn't complete yet. We'll need to read some documentation before we can guide people along the further steps.

Also, we noticed that for site_03, the participants.json file contains row data, which should only be in participants.tsv, so we should open an issue with Maryam to fix it, and make the curation documentation clearer on that point.

Meeting - site_012

Lisa didn't yet have her dataset curated. We got halfway through curation, and did not start at all on uploading. Also like @mguaypaq said, we were figuring out Windows remotely as we went, never having used much of this software stack ourselves there.

We didn't have Administrator on the computer she was working on. The git installer was able to handle this by itself, but we had to tweak the installer settings for both git-annex and python to make sure to install to C:\Users\%USER\AppData\Local\Programs and not to try to install anything system-wide.

dcm2niix continues to be tricky because it doesn't have an installer. It just has zip files of binaries. We put it in C:\Users\%USER%\bin, because git-bash has that on its $PATH, but it's unclear if that's a good long-term recommendation. It's in apt and brew, and there's a conda package that could be used on Windows, if we were to get people to install conda first.

By the way, we skipped using pycharm or a virtualenv, and that worked fine. Our curators are not often developers, so explaining virtualenvs is a whole extra can of worms that derails the training. venv only helps when you're developing many python projects and those projects have incompatible dependencies -- regular end users should just be able to pip install anything and mostly have things work (and if they don't, it should be a bug on the shoulders of the developers of those softwares).

Backups

Put backups on https://docs.computecanada.ca/wiki/Arbutus_Object_Storage. Even if backups are encrypted, the data sharing agreement says backups need to stay within the cluster.

Meeting - site_012

Today we had a meeting for 2 hours.

We gave @jcohenadad an account on https://spineimage.ca/, and made sure David Cadotte and our site_012 curator remembered their passwords too.
David Cadotte helped the site 12 curator use dcm2bids_helper and dcm2bids to construct and test a dcm2bids config file. It is a tedious process, doing:
1. dcm2bids_helper -d sourcedata/$subject_id --force
2. Examine the contents of all tmp_dcm2bids/helper/*.json files
3. Create an entry in code/dcm2bids_config.json that matches (SeriesDescription, ProtocolName) from those JSON files with (dataType, modalityLabel, customLabels).
4. Running dcm2bids -c code/dcm2bids_config.json -d sourcedata/$subject_id -p $sequential_id
David estimated it takes half an hour per subject, even once the curator is fluent with it.

(note that the mapping between $subject_id and $sequential_id is secret, and maintained by individual curators and Praxis)
We decided to update the curation protocol by adding a site prefix to the subject IDs, e.g. hal019 for subject 19 from Halifax, ott002 for subject 2 from Ottawa.

@mguaypaq has helped me merge in HTTP downloads, which means we now can host open access datasets on any server we choose, in any country we can find one.

I've tested it by moving a complete copy of https://github.com/spine-generic/data-single-subject onto https://data.dev.neuropoly.org/: https://data.dev.neuropoly.org/nick.guenther/spine-generic-single/.

Uploading

First, download the current copy: ``` p115628@joplin:~/datasets$ git clone https://github.com/spine-generic/data-single-subject Clonage dans 'data-single-subject'... remote: Enumerating objects: 2378, done. remote: Counting objects: 100% (545/545), done. remote: Compressing objects: 100% (344/344), done. remote: Total 2378 (delta 55), reused 384 (delta 49), pack-reused 1833 Réception d'objets: 100% (2378/2378), 299.46 Kio | 1.42 Mio/s, fait. Résolution des deltas: 100% (578/578), fait. p115628@joplin:~/datasets$ cd data-single-subject/ p115628@joplin:~/datasets/data-single-subject$ git clone^C p115628@joplin:~/datasets/data-single-subject$ time git annex get (merging origin/git-annex origin/synced/git-annex into git-annex...) (recording state in git...) (scanning for unlocked files...) get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (from amazon...) (checksum...) ok get derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T1w.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T2star.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T2w.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (from amazon...) (checksum...) ok get sub-douglas/anat/sub-douglas_T1w.nii.gz (from amazon...) (checksum...) ok get sub-douglas/anat/sub-douglas_T2star.nii.gz (from amazon...) (checksum...) ok get sub-douglas/anat/sub-douglas_T2w.nii.gz (from amazon...) (checksum...) ok get sub-douglas/dwi/sub-douglas_dwi.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_T1w.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_T2star.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_T2w.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-glen/dwi/sub-glen_dwi.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (from amazon...) (checksum...) ok get sub-mgh/anat/sub-mgh_T1w.nii.gz (from amazon...) (checksum...) ok get sub-mgh/anat/sub-mgh_T2star.nii.gz (from amazon...) (checksum...) ok get sub-mgh/anat/sub-mgh_T2w.nii.gz (from amazon...) (checksum...) ok get sub-mgh/anat/sub-mgh_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-mgh/anat/sub-mgh_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-mgh/anat/sub-mgh_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-mgh/dwi/sub-mgh_acq-b0_dwi.nii.gz (from amazon...) (checksum...) ok get sub-mgh/dwi/sub-mgh_dwi.nii.gz (from amazon...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w.nii.gz (from amazon...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_T2star.nii.gz (from amazon...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_T2w.nii.gz (from amazon...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-oxfordFmrib/dwi/sub-oxfordFmrib_dwi.nii.gz (from amazon...) (checksum...) ok get sub-perform/anat/sub-perform_T1w.nii.gz (from amazon...) (checksum...) ok get sub-perform/anat/sub-perform_T2star.nii.gz (from amazon...) (checksum...) ok get sub-perform/anat/sub-perform_T2w.nii.gz (from amazon...) (checksum...) ok get sub-perform/anat/sub-perform_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-perform/anat/sub-perform_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-perform/anat/sub-perform_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-perform/dwi/sub-perform_dwi.nii.gz (from amazon...) (checksum...) ok get sub-poly/anat/sub-poly_T1w.nii.gz (from amazon...) (checksum...) ok get sub-poly/anat/sub-poly_T2star.nii.gz (from amazon...) (checksum...) ok get sub-poly/anat/sub-poly_T2w.nii.gz (from amazon...) (checksum...) ok get sub-poly/anat/sub-poly_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-poly/anat/sub-poly_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-poly/anat/sub-poly_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-poly/dwi/sub-poly_dwi.nii.gz (from amazon...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_T1w.nii.gz (from amazon...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_T2star.nii.gz (from amazon...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_T2w.nii.gz (from amazon...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyo750w/dwi/sub-tokyo750w_dwi.nii.gz (from amazon...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_T1w.nii.gz (from amazon...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_T2star.nii.gz (from amazon...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_T2w.nii.gz (from amazon...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoIngenia/dwi/sub-tokyoIngenia_dwi.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_T1w.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_T2star.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_T2w.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna1/dwi/sub-tokyoSigna1_dwi.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_T2star.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_T2w.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_T1w.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_T2star.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_T2w.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-tokyoSkyra/dwi/sub-tokyoSkyra_dwi.nii.gz (from amazon...) (checksum...) ok get sub-ucl/anat/sub-ucl_T1w.nii.gz (from amazon...) (checksum...) ok get sub-ucl/anat/sub-ucl_T2star.nii.gz (from amazon...) (checksum...) ok get sub-ucl/anat/sub-ucl_T2w.nii.gz (from amazon...) (checksum...) ok get sub-ucl/anat/sub-ucl_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-ucl/anat/sub-ucl_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-ucl/anat/sub-ucl_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-ucl/dwi/sub-ucl_dwi.nii.gz (from amazon...) (checksum...) ok get sub-unf/anat/sub-unf_T1w.nii.gz (from amazon...) (checksum...) ok get sub-unf/anat/sub-unf_T2star.nii.gz (from amazon...) (checksum...) ok get sub-unf/anat/sub-unf_T2w.nii.gz (from amazon...) (checksum...) ok get sub-unf/anat/sub-unf_acq-MToff_MTS.nii.gz (from amazon...) (checksum...) ok get sub-unf/anat/sub-unf_acq-MTon_MTS.nii.gz (from amazon...) (checksum...) ok get sub-unf/anat/sub-unf_acq-T1w_MTS.nii.gz (from amazon...) (checksum...) ok get sub-unf/dwi/sub-unf_dwi.nii.gz (from amazon...) (checksum...) ok (recording state in git...) real 1m10,223s user 0m52,723s sys 0m5,956s ``` Then upload: ``` p115628@joplin:~/datasets/data-single-subject$ git remote add gg gitea@data.dev.neuropoly.org:nick.guenther/spine-generic-single.git p115628@joplin:~/datasets/data-single-subject$ git push -u gg master Énumération des objets: 720, fait. Décompte des objets: 100% (720/720), fait. Compression par delta en utilisant jusqu'à 128 fils d'exécution Compression des objets: 100% (418/418), fait. Écriture des objets: 100% (720/720), 153.85 Kio | 153.85 Mio/s, fait. Total 720 (delta 272), réutilisés 719 (delta 272), réutilisés du pack 0 remote: Résolution des deltas: 100% (272/272), fait. remote: . Processing 1 references remote: Processed 1 references in total To data.dev.neuropoly.org:nick.guenther/spine-generic-single.git * [new branch] master -> master La branche 'master' est paramétrée pour suivre la branche distante 'master' depuis 'gg'. p115628@joplin:~/datasets/data-single-subject$ git annex sync gg commit Sur la branche master Votre branche est à jour avec 'gg/master'. rien à valider, la copie de travail est propre ok pull gg remote: Énumération des objets: 2, fait. remote: Décompte des objets: 100% (2/2), fait. remote: Total 2 (delta 0), réutilisés 0 (delta 0), réutilisés du pack 0 Dépaquetage des objets: 100% (2/2), 137 octets | 137.00 Kio/s, fait. Depuis data.dev.neuropoly.org:nick.guenther/spine-generic-single * [nouvelle branche] git-annex -> gg/git-annex ok (merging gg/git-annex into git-annex...) (recording state in git...) push gg Énumération des objets: 2093, fait. Décompte des objets: 100% (2093/2093), fait. Compression par delta en utilisant jusqu'à 128 fils d'exécution Compression des objets: 100% (1205/1205), fait. Écriture des objets: 100% (2092/2092), 135.47 Kio | 6.16 Mio/s, fait. Total 2092 (delta 988), réutilisés 1085 (delta 305), réutilisés du pack 0 remote: Résolution des deltas: 100% (988/988), fait. remote: remote: Create a new pull request for 'synced/git-annex': remote: https://data.dev.neuropoly.org/nick.guenther/spine-generic-single/compare/master...synced/git-annex remote: remote: remote: Create a new pull request for 'synced/master': remote: https://data.dev.neuropoly.org/nick.guenther/spine-generic-single/compare/master...synced/master remote: remote: .. Processing 2 references remote: Processed 2 references in total To data.dev.neuropoly.org:nick.guenther/spine-generic-single.git * [new branch] git-annex -> synced/git-annex * [new branch] master -> synced/master ok p115628@joplin:~/datasets/data-single-subject$ time git annex copy --to gg copy derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (to gg...) ok copy derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (to gg...) ok copy sub-chiba750/anat/sub-chiba750_T1w.nii.gz (to gg...) ok copy sub-chiba750/anat/sub-chiba750_T2star.nii.gz (to gg...) ok copy sub-chiba750/anat/sub-chiba750_T2w.nii.gz (to gg...) ok copy sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (to gg...) ok copy sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (to gg...) ok copy sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (to gg...) ok copy sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (to gg...) ok copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (to gg...) ok copy sub-douglas/anat/sub-douglas_T1w.nii.gz (to gg...) ok copy sub-douglas/anat/sub-douglas_T2star.nii.gz (to gg...) ok copy sub-douglas/anat/sub-douglas_T2w.nii.gz (to gg...) ok copy sub-douglas/dwi/sub-douglas_dwi.nii.gz (to gg...) ok copy sub-glen/anat/sub-glen_T1w.nii.gz (to gg...) ok copy sub-glen/anat/sub-glen_T2star.nii.gz (to gg...) ok copy sub-glen/anat/sub-glen_T2w.nii.gz (to gg...) ok copy sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-glen/dwi/sub-glen_dwi.nii.gz (to gg...) ok copy sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (to gg...) ok copy sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (to gg...) ok copy sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (to gg...) ok copy sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (to gg...) ok copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (to gg...) ok copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (to gg...) ok copy sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (to gg...) ok copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (to gg...) ok copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (to gg...) ok copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (to gg...) ok copy sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (to gg...) ok copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (to gg...) ok copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (to gg...) ok copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (to gg...) ok copy sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (to gg...) ok copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (to gg...) ok copy sub-mgh/anat/sub-mgh_T1w.nii.gz (to gg...) ok copy sub-mgh/anat/sub-mgh_T2star.nii.gz (to gg...) ok copy sub-mgh/anat/sub-mgh_T2w.nii.gz (to gg...) ok copy sub-mgh/anat/sub-mgh_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-mgh/anat/sub-mgh_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-mgh/anat/sub-mgh_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-mgh/dwi/sub-mgh_acq-b0_dwi.nii.gz (to gg...) ok copy sub-mgh/dwi/sub-mgh_dwi.nii.gz (to gg...) ok copy sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w.nii.gz (to gg...) ok copy sub-oxfordFmrib/anat/sub-oxfordFmrib_T2star.nii.gz (to gg...) ok copy sub-oxfordFmrib/anat/sub-oxfordFmrib_T2w.nii.gz (to gg...) ok copy sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-oxfordFmrib/dwi/sub-oxfordFmrib_dwi.nii.gz (to gg...) ok copy sub-perform/anat/sub-perform_T1w.nii.gz (to gg...) ok copy sub-perform/anat/sub-perform_T2star.nii.gz (to gg...) ok copy sub-perform/anat/sub-perform_T2w.nii.gz (to gg...) ok copy sub-perform/anat/sub-perform_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-perform/anat/sub-perform_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-perform/anat/sub-perform_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-perform/dwi/sub-perform_dwi.nii.gz (to gg...) ok copy sub-poly/anat/sub-poly_T1w.nii.gz (to gg...) ok copy sub-poly/anat/sub-poly_T2star.nii.gz (to gg...) ok copy sub-poly/anat/sub-poly_T2w.nii.gz (to gg...) ok copy sub-poly/anat/sub-poly_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-poly/anat/sub-poly_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-poly/anat/sub-poly_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-poly/dwi/sub-poly_dwi.nii.gz (to gg...) ok copy sub-tokyo750w/anat/sub-tokyo750w_T1w.nii.gz (to gg...) ok copy sub-tokyo750w/anat/sub-tokyo750w_T2star.nii.gz (to gg...) ok copy sub-tokyo750w/anat/sub-tokyo750w_T2w.nii.gz (to gg...) ok copy sub-tokyo750w/anat/sub-tokyo750w_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-tokyo750w/anat/sub-tokyo750w_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-tokyo750w/anat/sub-tokyo750w_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-tokyo750w/dwi/sub-tokyo750w_dwi.nii.gz (to gg...) ok copy sub-tokyoIngenia/anat/sub-tokyoIngenia_T1w.nii.gz (to gg...) ok copy sub-tokyoIngenia/anat/sub-tokyoIngenia_T2star.nii.gz (to gg...) ok copy sub-tokyoIngenia/anat/sub-tokyoIngenia_T2w.nii.gz (to gg...) ok copy sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-tokyoIngenia/dwi/sub-tokyoIngenia_dwi.nii.gz (to gg...) ok copy sub-tokyoSigna1/anat/sub-tokyoSigna1_T1w.nii.gz (to gg...) ok copy sub-tokyoSigna1/anat/sub-tokyoSigna1_T2star.nii.gz (to gg...) ok copy sub-tokyoSigna1/anat/sub-tokyoSigna1_T2w.nii.gz (to gg...) ok copy sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-tokyoSigna1/dwi/sub-tokyoSigna1_dwi.nii.gz (to gg...) ok copy sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w.nii.gz (to gg...) ok copy sub-tokyoSigna2/anat/sub-tokyoSigna2_T2star.nii.gz (to gg...) ok copy sub-tokyoSigna2/anat/sub-tokyoSigna2_T2w.nii.gz (to gg...) ok copy sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi.nii.gz (to gg...) ok copy sub-tokyoSkyra/anat/sub-tokyoSkyra_T1w.nii.gz (to gg...) ok copy sub-tokyoSkyra/anat/sub-tokyoSkyra_T2star.nii.gz (to gg...) ok copy sub-tokyoSkyra/anat/sub-tokyoSkyra_T2w.nii.gz (to gg...) ok copy sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-tokyoSkyra/dwi/sub-tokyoSkyra_dwi.nii.gz (to gg...) ok copy sub-ucl/anat/sub-ucl_T1w.nii.gz (to gg...) ok copy sub-ucl/anat/sub-ucl_T2star.nii.gz (to gg...) ok copy sub-ucl/anat/sub-ucl_T2w.nii.gz (to gg...) ok copy sub-ucl/anat/sub-ucl_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-ucl/anat/sub-ucl_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-ucl/anat/sub-ucl_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-ucl/dwi/sub-ucl_dwi.nii.gz (to gg...) ok copy sub-unf/anat/sub-unf_T1w.nii.gz (to gg...) ok copy sub-unf/anat/sub-unf_T2star.nii.gz (to gg...) ok copy sub-unf/anat/sub-unf_T2w.nii.gz (to gg...) ok copy sub-unf/anat/sub-unf_acq-MToff_MTS.nii.gz (to gg...) ok copy sub-unf/anat/sub-unf_acq-MTon_MTS.nii.gz (to gg...) ok copy sub-unf/anat/sub-unf_acq-T1w_MTS.nii.gz (to gg...) ok copy sub-unf/dwi/sub-unf_dwi.nii.gz (to gg...) ok (recording state in git...) real 0m24,742s user 0m2,539s sys 0m3,207s ``` This DigitalOcean server is crazy fast by the way: that's 293.66Mb/s upload. Downloading from Amazon was still pretty fast but it was "only" 100Mb/s. And the DigitalOcean server is in Toronto but the Amazon servers are (supposed to be) in Montreal. ---

I went to its settings to make it Public:

Screenshot 2022-09-22 at 21-19-17 spine-generic-single

then I downloaded anonymously, with the same commands we tell people to currently use against the GitHub copy:

Download

``` p115628@joplin:~/datasets$ time git clone https://data.dev.neuropoly.org/nick.guenther/spine-generic-single.git data-single-subject-http2 Clonage dans 'data-single-subject-http2'... remote: Enumerating objects: 3243, done. remote: Counting objects: 100% (3243/3243), done. remote: Compressing objects: 100% (1228/1228), done. remote: Total 3243 (delta 1614), reused 2595 (delta 1260), pack-reused 0 Réception d'objets: 100% (3243/3243), 311.17 Kio | 8.19 Mio/s, fait. Résolution des deltas: 100% (1614/1614), fait. real 0m0,317s user 0m0,171s sys 0m0,104s p115628@joplin:~/datasets$ cd data-single-subject-http2 p115628@joplin:~/datasets/data-single-subject-http2$ time git annex get (merging origin/git-annex origin/synced/git-annex into git-annex...) (recording state in git...) (scanning for unlocked files...) get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-tokyo750w/dwi/sub-tokyo750w_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w_RPI_r_seg-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) (checksum...) ok get derivatives/labels/sub-ucl/anat/sub-ucl_T1w_RPI_r_labels-manual.nii.gz (from origin...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T1w.nii.gz (from origin...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T2star.nii.gz (from origin...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_T2w.nii.gz (from origin...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-chiba750/anat/sub-chiba750_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-chiba750/dwi/sub-chiba750_dwi.nii.gz (from origin...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T1w.nii.gz (from origin...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T2star.nii.gz (from origin...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_T2w.nii.gz (from origin...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-chibaIngenia/anat/sub-chibaIngenia_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-chibaIngenia/dwi/sub-chibaIngenia_dwi.nii.gz (from origin...) (checksum...) ok get sub-douglas/anat/sub-douglas_T1w.nii.gz (from origin...) (checksum...) ok get sub-douglas/anat/sub-douglas_T2star.nii.gz (from origin...) (checksum...) ok get sub-douglas/anat/sub-douglas_T2w.nii.gz (from origin...) (checksum...) ok get sub-douglas/dwi/sub-douglas_dwi.nii.gz (from origin...) (checksum...) ok get sub-glen/anat/sub-glen_T1w.nii.gz (from origin...) (checksum...) ok get sub-glen/anat/sub-glen_T2star.nii.gz (from origin...) (checksum...) ok get sub-glen/anat/sub-glen_T2w.nii.gz (from origin...) (checksum...) ok get sub-glen/anat/sub-glen_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-glen/anat/sub-glen_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-glen/anat/sub-glen_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-glen/dwi/sub-glen_dwi.nii.gz (from origin...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T1w.nii.gz (from origin...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T2star.nii.gz (from origin...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_T2w.nii.gz (from origin...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendo750w/anat/sub-juntendo750w_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendo750w/dwi/sub-juntendo750w_dwi.nii.gz (from origin...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T1w.nii.gz (from origin...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2star.nii.gz (from origin...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_T2w.nii.gz (from origin...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoAchieva/anat/sub-juntendoAchieva_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi.nii.gz (from origin...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T1w.nii.gz (from origin...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2star.nii.gz (from origin...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_T2w.nii.gz (from origin...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoPrisma/anat/sub-juntendoPrisma_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoPrisma/dwi/sub-juntendoPrisma_dwi.nii.gz (from origin...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T1w.nii.gz (from origin...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2star.nii.gz (from origin...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_T2w.nii.gz (from origin...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoSkyra/anat/sub-juntendoSkyra_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-juntendoSkyra/dwi/sub-juntendoSkyra_dwi.nii.gz (from origin...) (checksum...) ok get sub-mgh/anat/sub-mgh_T1w.nii.gz (from origin...) (checksum...) ok get sub-mgh/anat/sub-mgh_T2star.nii.gz (from origin...) (checksum...) ok get sub-mgh/anat/sub-mgh_T2w.nii.gz (from origin...) (checksum...) ok get sub-mgh/anat/sub-mgh_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-mgh/anat/sub-mgh_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-mgh/anat/sub-mgh_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-mgh/dwi/sub-mgh_acq-b0_dwi.nii.gz (from origin...) (checksum...) ok get sub-mgh/dwi/sub-mgh_dwi.nii.gz (from origin...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w.nii.gz (from origin...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_T2star.nii.gz (from origin...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_T2w.nii.gz (from origin...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-oxfordFmrib/anat/sub-oxfordFmrib_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-oxfordFmrib/dwi/sub-oxfordFmrib_dwi.nii.gz (from origin...) (checksum...) ok get sub-perform/anat/sub-perform_T1w.nii.gz (from origin...) (checksum...) ok get sub-perform/anat/sub-perform_T2star.nii.gz (from origin...) (checksum...) ok get sub-perform/anat/sub-perform_T2w.nii.gz (from origin...) (checksum...) ok get sub-perform/anat/sub-perform_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-perform/anat/sub-perform_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-perform/anat/sub-perform_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-perform/dwi/sub-perform_dwi.nii.gz (from origin...) (checksum...) ok get sub-poly/anat/sub-poly_T1w.nii.gz (from origin...) (checksum...) ok get sub-poly/anat/sub-poly_T2star.nii.gz (from origin...) (checksum...) ok get sub-poly/anat/sub-poly_T2w.nii.gz (from origin...) (checksum...) ok get sub-poly/anat/sub-poly_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-poly/anat/sub-poly_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-poly/anat/sub-poly_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-poly/dwi/sub-poly_dwi.nii.gz (from origin...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_T1w.nii.gz (from origin...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_T2star.nii.gz (from origin...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_T2w.nii.gz (from origin...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyo750w/anat/sub-tokyo750w_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyo750w/dwi/sub-tokyo750w_dwi.nii.gz (from origin...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_T1w.nii.gz (from origin...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_T2star.nii.gz (from origin...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_T2w.nii.gz (from origin...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoIngenia/anat/sub-tokyoIngenia_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoIngenia/dwi/sub-tokyoIngenia_dwi.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_T1w.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_T2star.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_T2w.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna1/anat/sub-tokyoSigna1_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna1/dwi/sub-tokyoSigna1_dwi.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_T1w.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_T2star.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_T2w.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna2/anat/sub-tokyoSigna2_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSigna2/dwi/sub-tokyoSigna2_dwi.nii.gz (from origin...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_T1w.nii.gz (from origin...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_T2star.nii.gz (from origin...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_T2w.nii.gz (from origin...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSkyra/anat/sub-tokyoSkyra_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-tokyoSkyra/dwi/sub-tokyoSkyra_dwi.nii.gz (from origin...) (checksum...) ok get sub-ucl/anat/sub-ucl_T1w.nii.gz (from origin...) (checksum...) ok get sub-ucl/anat/sub-ucl_T2star.nii.gz (from origin...) (checksum...) ok get sub-ucl/anat/sub-ucl_T2w.nii.gz (from origin...) (checksum...) ok get sub-ucl/anat/sub-ucl_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-ucl/anat/sub-ucl_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-ucl/anat/sub-ucl_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-ucl/dwi/sub-ucl_dwi.nii.gz (from origin...) (checksum...) ok get sub-unf/anat/sub-unf_T1w.nii.gz (from origin...) (checksum...) ok get sub-unf/anat/sub-unf_T2star.nii.gz (from origin...) (checksum...) ok get sub-unf/anat/sub-unf_T2w.nii.gz (from origin...) (checksum...) ok get sub-unf/anat/sub-unf_acq-MToff_MTS.nii.gz (from origin...) (checksum...) ok get sub-unf/anat/sub-unf_acq-MTon_MTS.nii.gz (from origin...) (checksum...) ok get sub-unf/anat/sub-unf_acq-T1w_MTS.nii.gz (from origin...) (checksum...) ok get sub-unf/dwi/sub-unf_dwi.nii.gz (from origin...) (checksum...) ok (recording state in git...) real 1m13,143s user 1m4,191s sys 0m5,872s p115628@joplin:~/datasets/data-single-subject-http2$ git remote -v amazon origin https://data.dev.neuropoly.org/nick.guenther/spine-generic-single.git (fetch) origin https://data.dev.neuropoly.org/nick.guenther/spine-generic-single.git (push) ``` Well, maybe I take back what I said: DigitalOcean's download was almost identical to Amazon's. Funny that the uplink was faster, usually it's the other way around.

Also to note: because Push-to-Create is turned on in our gitea config, there was very little fumbling around. A single git annex sync --content should be enough to upload everything, no messing with the web UI.

To emphasize again: we now have a fully alternate copy of https://github.com/spine-generic/data-single-subject. Identical download instructions work for it; all someone has to do is swap in https://data.dev.neuropoly.org/nick.guenther/spine-generic-single/. :tada: (fair warning though: this is a dev server). And with this arrangement, all bandwidth -- git and git-annex -- is paid for through DigitalOcean, instead of splitting the bill between GitHub and Amazon. And when we do promote this to a production server, we can get rid of the difficult contributor AWS credentials.

Unfortunately I've already found one bug that we missed in https://github.com/neuropoly/gitea/issues/19 but it's minor.

Backups

Put backups on https://docs.computecanada.ca/wiki/Arbutus_Object_Storage. Even if backups are encrypted, the data sharing agreement says backups need to stay within the cluster.

I'm working on this now. That wiki page is helpful but I still have to fill in some details, which I am writing down here:

Get dependencies.

You need openstack. I'm on Arch but this should be portable to Ubuntu:
```
sudo pacman -S python-openstackclient
```
Go to https://arbutus.cloud.computecanada.ca/auth/login/?next=/project/ and log in
Download the OpenStack RC file from it.

![Screenshot_20221130_200005](https://user-images.githubusercontent.com/987487/204940478-c285e27a-bbbe-4b47-9152-4915fd2f33d9.png)

Load the OpenStack RC file.

$ . ~/def-jcohen-dev-openrc.sh 
Please enter your OpenStack Password for project def-jcohen-dev as user nguenthe: [ TYPE ARBUTUS PASSWORD HERE ]

Create an S3 token.

Tokens are something I can give out to the backup bot without compromising my complete account.

openstack ec2 credentials create
I could probably use `openstack ec2 credentials create -f json` to script the rest of the process from here, but since this is a rare mostly-one-time operation I'm not going to bother. ``` $ openstack ec2 credentials create # these are not the real credentials, I revoked these after pasting them here +------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | access | 5f192d6ec2124fd3b23ad76036b31107 | | links | {'self': 'https://arbutus.cloud.computecanada.ca:5000/v3/users/885e1521925cb445789bb33f9c0e035ee6e1256d01fbb7257301ec965f86a966/credentials/OS-EC2/5f192d6ec2124fd3b23ad76036b31107'} | | project_id | 455810c28e2e4b36a223e6cd2e6abcdc | | secret | f7828fa44b0a4fa186df7bb6608ff975 | | trust_id | None | | user_id | 885e1521925cb445789bb33f9c0e035ee6e1256d01fbb7257301ec965f86a966 | +------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ```
Generate a restic password

``` $ pwgen 100 1 # this is not the real password wai5woh2gaa0dio6OoSaudoc8AeKufee8Oof6nuNgie7zeitheiciat3aeleghie4Ooqu1juuC3ohroh2ipah9eegaeFurohphah ```
Now, move to the target server and install restic

``` # apt install -y restic ```
Still on the target server, provide the credentials to restic:

As a reminder, restic takes creds via envvars. ``` $ cat spineimage.ca-restic.cfg export RESTIC_REPOSITORY=s3:object-arbutus.cloud.computecanada.ca/def-jcohen-test export RESTIC_PASSWORD=wai5woh2gaa0dio6OoSaudoc8AeKufee8Oof6nuNgie7zeitheiciat3aeleghie4Ooqu1juuC3ohroh2ipah9eegaeFurohphah export AWS_ACCESS_KEY_ID=5f192d6ec2124fd3b23ad76036b31107 export AWS_SECRET_ACCESS_KEY=f7828fa44b0a4fa186df7bb6608ff975 $ . spineimage.ca-restic.cfg ```
Create the backup repo

``` $ restic init created restic repository c22ae4a3c4 at s3:object-arbutus.cloud.computecanada.ca/def-jcohen-test Please note that knowledge of your password is required to access the repository. Losing your password means that your data is irrecoverably lost. $ restic snapshots # test repository c22ae4a3 opened (repository version 2) successfully, password is correct ```

At this point, we only need to keep the restic config. We can toss the OpenStack RC file; if we need it again, we can grab it again.

Now how to interface Gitea with restic?

According to https://docs.gitea.io/en-us/backup-and-restore, backing up Gitea is the standard webapp process: dump the database and save the data folder,. It has a gitea dump command, but warns

Gitea admins may prefer to use the native the MySQL and PostgreSQL dump tools instead. There are still open issues when using XORM for dumping the database that may cause problems when attempting to restore it.

and I indeed ran into this while experimenting a few months ago: you cannot restore a gitea dump, especially if you've gone through a few Gitea versions since taking the backup, so I don't trust gitea dump; all it does is basically run pg_dump > data/gitea-db.sql and then zip the data/ folder. Plus, zipping is slow (it zips the repos!) and may actually make restic work worse by interfering with its own compression.

Also, restoring is a manual process:

There is currently no support for a recovery command. It is a manual process that mostly involves moving files to their correct locations and restoring a database dump.

So I'm going to ignore gitea dump and write my own backup script.

Limitations

From the docs:

Buckets are owned by the user who creates them, and no other user can manipulate them.

I can make tokens for everyone who will be adminning this server, but as far as ComputeCanada is concerned, all of them are me. I don't know how to handle this. Maybe when I eventually hand this off to someone else we'll have to copy all the backups to a new bucket.

Alternatives

According to restic, it can talk to OpenStack directly, without using the S3 protocol. But I got it working through S3 and I think that's fine.

Backups

My existing backup scripts on data.neuro.polymtl.ca (#20) look like

git@data:~$ cat ~/.config/restic/s3 
RESTIC_REPOSITORY=s3:s3.ca-central-1.amazonaws.com/data.neuro.polymtl.ca.restic
RESTIC_PASSWORD=xxxxxxxxxxxxxxxxxxxxxxxxxxx
AWS_ACCESS_KEY_ID=aaaaaaaaaaaaaaaaaaaa
AWS_SECRET_ACCESS_KEY=kkkkkkkkkkkkkkkkkkkk

git@data:~$ cat ~/.config/restic/CC
RESTIC_REPOSITORY=sftp://narval.computecanada.ca/:projects/def-jcohen/data.neuro.polymtl.ca.restic
RESTIC_PASSWORD=xxxxxxxxxxxxxxxxxxxxxxxxxxxx

git@data:~$ cat /etc/cron.d/backup-git 
# daily backups
0 2 * * *    git   (set -a; . ~/.config/restic/s3; cd ~; chronic restic backup --one-file-system repositories)
0 3 * * *    git   (set -a; . ~/.config/restic/CC; cd ~; chronic restic backup --one-file-system repositories)

# backup integrity checks
0 4 */3 * *  git   (set -a; . ~/.config/restic/s3; chronic restic check --read-data-subset=1/27)
0 5 */3 * *  git   (set -a; . ~/.config/restic/CC; chronic restic check --read-data-subset=1/9)

# compressing backups by pruning
0 6 * * *    git   (set -a; . ~/.config/restic/s3; chronic restic forget --prune --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 3)
0 7 * * *    git   (set -a; . ~/.config/restic/CC; chronic restic forget --prune --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 3)

That's for Gitolite. Porting this to Gitea is tricky because of the required downtime. This requirement seems to complicate everything because I want backups to run as gitea, but this forces part of the script to run as root; which means either the whole script needs to start as root then drop privileges, or start as gitea and use sudo to gain privileges, and it's always risky to try to do limited grants with sudo.

I'm tempted to ignore this requirement. I did some experiments and found that git annex sync --content transfers (which use rsync underneath) continue even after systemctl stop gitea, and git push transfers too, so there's no way to get a 100% consistent snapshot anyway.

Experiment

[Peek 2022-12-05 19-10.webm](https://user-images.githubusercontent.com/987487/205773942-1a7e16c2-8ffb-4432-9937-f7b13a5c5bf3.webm)

I'm going to compromise:

# daily backups
0 1 * * *    root   (systemctl stop gitea && su -c 'pg_dump gitea > ~gitea/gitea-db.sql' gitea; systemctl restart gitea)
0 2 * * *    gitea   (set -a; . ~/.config/restic/CC; cd ~; chronic restic backup --one-file-system gitea-db.sql data)

# backup integrity checks
0 4 */3 * *  gitea   (set -a; . ~/.config/restic/CC; chronic restic check --read-data-subset=5G)

# compressing backups by pruning
0 6 * * *    gitea   (set -a; . ~/.config/restic/CC; chronic restic forget --prune --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 3)

This way, while the database and contents of data/ may drift a little apart, the worst that will happen is there are some commits in some repos that are newer than the database, or there are some avatars or other attachments that the database doesn't know about.

Ansible

I'm working on coding this up in https://github.com/neuropoly/computers/pull/434

EDIT: in that PR, I decided to ignore backup consistency. I think it will be okay. Only a very busy server would have problems anyway, which our servers will definitely not be, and I'm not even convinced it's that big a problem if the git repos and avatars are slightly out of sync with the database. And Gitea already has code to handle resyncing at least some cases, because digital entropy can always cause mistakes. I think at worst, it may fall back to using an older avatar for one person.

I merged neurogitea backups today and wanted to use them for this prod server. But first I had to

Ubuntu Upgrade

I used do-release-upgrade to upgrade drone.spineimage.ca and spineimage.ca.

There was a snag: the upgrade killed postgres-12 and replaced it with postgres-14; it sent me an email warning me to run pg_upgradecluster 12 main before continuing, but I ignored that and ran apt-get autopurge over eagerly. So I lost the database. :cry:

Restore

Luckily, I had backups from December (taken above). I did apt-get purge postgresql-common, redeployed, and then followed my own docs to get it back:

root@spineimage:~# systemctl stop gitea
root@spineimage:~# su -l gitea -s /bin/bash
$ bash
gitea@spineimage:~$ restic-no arbutus snapshots
repository 2d22bf7f opened successfully, password is correct
ID        Time                 Host           Tags        Paths
---------------------------------------------------------------------------------
2547ebc9  2022-11-30 21:29:17  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

04e8abc1  2022-11-30 21:56:17  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

e66de5bb  2022-12-01 00:20:08  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

95325d1b  2022-12-01 00:20:29  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

4a530419  2022-12-01 00:20:57  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

ae839c6c  2022-12-01 00:26:27  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

197c4af8  2022-12-01 00:26:47  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

48f35777  2022-12-01 00:27:49  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

0ea0845d  2022-12-01 00:28:49  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

342d08dd  2022-12-01 01:18:25  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

221b5622  2022-12-01 01:20:02  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

4c3d1c67  2022-12-01 01:55:42  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

5b49742d  2022-12-01 01:56:52  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

8cc26371  2022-12-01 01:57:49  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql

420e46d9  2023-02-09 23:02:48  spineimage.ca              /srv/gitea/data
                                                          /srv/gitea/gitea-db.sql
---------------------------------------------------------------------------------
15 snapshots
gitea@spineimage:~$ restic-no arbutus restore latest --include gitea-db.sql --target /tmp/r
repository 2d22bf7f opened successfully, password is correct
restoring <Snapshot 420e46d9 of [/srv/gitea/gitea-db.sql /srv/gitea/data] at 2023-02-09 23:02:48.947750225 -0500 EST by gitea@spineimage.ca> to /tmp/r
gitea@spineimage:~$ psql gitea < /tmp/r/gitea-db.sql
SET
SET
SET
SET
SET
 set_config 
------------

(1 row)

SET
SET
SET
SET
SET
SET
ERROR:  relation "access" already exists
[...]
ERROR:  relation "UQE_watch_watch" already exists
ERROR:  relation "UQE_webauthn_credential_s" already exists
gitea@spineimage:~$ exit

unfortunately I didn't take the backup with pg_dump --clean --if-exists, so I got all these errors. So I manually recreated an empty DB:

root@spineimage:~# sudo -u postgres psql 
psql (14.6 (Ubuntu 14.6-0ubuntu0.22.04.1))
Type "help" for help.

postgres-# \l
                              List of databases
   Name    |  Owner   | Encoding | Collate |  Ctype  |   Access privileges   
-----------+----------+----------+---------+---------+-----------------------
 gitea     | gitea    | UTF8     | C.UTF-8 | C.UTF-8 | 
 postgres  | postgres | UTF8     | C.UTF-8 | C.UTF-8 | 
 template0 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
 template1 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
(4 rows)

postgres-# drop database gitea
postgres=# create database gitea with owner gitea;
CREATE DATABASE
postgres-# \l
                              List of databases
   Name    |  Owner   | Encoding | Collate |  Ctype  |   Access privileges   
-----------+----------+----------+---------+---------+-----------------------
 gitea     | gitea    | UTF8     | C.UTF-8 | C.UTF-8 | 
 postgres  | postgres | UTF8     | C.UTF-8 | C.UTF-8 | 
 template0 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
 template1 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
(4 rows)

postgres-# \q

and reloaded again:

root@spineimage:~# su -l gitea -s /bin/bash
gitea@spineimage:~$ psql gitea < /tmp/r/gitea-db.sql
SET
[...]
CREATE INDEX
gitea@spineimage:~$ exit
root@spineimage:~# systemctl restart gitea

Gitea Upgrade

The redeploy above took care of upgrading Gitea, which went smoothly. It is now at 1.18.3+git-annex-cornerstone

Screenshot 2023-02-10 at 00-52-28 Neurogitea

and it does the inline previews (but I can't demo that here because it's private data).

I just noticed that ComputeCanada is affiliated with https://www.frdr-dfdr.ca/, sponsored by the federal government. They use Globus to upload data where we use git-annex. Should we consider recommending that instead of git?

They don't seem to do access control:

Anyone may use FRDR to search for and download datasets. You do not need to have a Globus Account affiliated with a Canadian postsecondary institution to download datasets in FRDR using Globus.

which rules them out for our use case. Also Globus doesn't do versioning, I'm pretty sure. For example, just look through https://www.frdr-dfdr.ca/discover/html/repository-list.html?lang=en and find, say, https://www.frdr-dfdr.ca/repo/dataset/6ede1dc2-149b-41a4-9083-a34165cb2537 and that doesn't show anything labelled "versions" as far as I can see.

Meeting - site_012

Today our site 12 curator was able to finish a dcm2bids config file and curate all subjects from her site, with David's advice. We included several t2starw images, that initially David thought we should drop until we realized they made up a large portion of Lisa's dataset.

One subject was dropped -- the previous sub-hal001 -- and the others subject IDs renumbered to start counting at sub-hal001.

There was also some debate about whether to tag sagittal scans with acq-sag or acq-sagittal. At poly we've used acq-sagittal but Ottawa's dataset uses acq-sag. At neuropoly we will standardize on this internally to match.

Right now curators have to manually run dcm2bids for each subject. @valosekj and @mguaypaq and I think it should be possible to write the loop for this, and write that into a script in each dataset's code/ folder. That would make curation more robust. @valosekj pointed out we can store the imaging IDs (the IDs used for each dataset's sourcedata/${ID}) in participants.tsv, using the 'source_id' column that we've standardized on. I think we can probably write a loop script that we can share between curators that reads their participants.tsv to know what sourcedata/ folder to look at (and maybe even patch dcm2bids with a batch mode that understands doing just that). That would be a lot more reliable than having curators run through the curation subject by subject every time they tweak the config file.

We have not yet run bids-validator on the dataset.

We spent a while trying to get the curator able to upload to spineimage.ca. I hoped we could start by committing and publishing the config file, and then adding subject data in steps, and refining the config file with more commits from there. I hoped it would be quick but we hit bugs and ran out of time for today. ssh is giving this error:

Screenshot_20230307_151447

We double-checked by using a few different servers and ports and also an PuTTY, an entirely separate program that should have nothing to do with the ssh that comes with Windows Git Bash

Screenshot_20230307_151723 Screenshot_20230307_151800

In all cases a closed port times out, an open port gives this error, after a successful TCP handshake. I remember the connection working back in July when we first were in touch with this site, but since then their hospital IT has done upgrades and now there seems to be some sort of firewall in the way. I will follow up with the curator by email with specific debugging instructions she can relay to her IT department.

Meeting - site_012

Today, because Halifax's IT department is apparently backlogged by months, we debugged further but without success. We created a short python script that basically just implements printf 'GET /\r\n\r\n' | nc $HOST $PORT and ran it against a few combinations or hosts and ports. In all cases, this worked, but when using an SSH client we are blocked. So SSH seems to be blocked as a protocol: when connecting over to port 443 on spineimage.ca via https://spineimage.ca:443, it worked, but when I reconfigured the server to run ssh on that same port and we tried both ssh://spineimage.ca:443 and sftp://spineimage.ca:443, it timed out; similarly, ssh -v reports "connected" before hanging and reporting "software caused connection abort" -- so the TCP connection is allow to port 22, but somehow to content of that connection is tripping it up. I suspect there is a deep-packet-inspecting firewall involved that is specifically blocking ssh :anger: .

As a workaround, we constructed a .tar.gz of the dataset and encrypted it using these tools https://stackoverflow.com/a/16056298/2898673. We resulted in a 1GB encrypted file, site_012.tar.gz.enc that's on Lisa's office machine. Mathieu and I have the decryption key saved securely on our machines. The curator is going to try to use a file hosting service NSHealth runs. If that fails, it might be possible to create an Issue in https://spineimage.ca/NSHA/site_012/issues and drag-and-drop the file onto it (so it becomes an attachment there), and in the worst case, it can be mailed on a thumbdrive to us at

Julien Cohen-Adad Ecole Polytechnique, Genie electrique 2500, Chemin Polytechnique, Porte S-114 Montreal, QC H3T 1J4 Canada

(ref: https://neuro.polymtl.ca/contact-us.html)

In the future, perhaps as a different workaround we can set up an HTTPS proxy; if the problem is a deep-packet inspecting firewall, wrapping the ssh connection in an HTTPS one should defeat it. I believe these instructions solve that https://stackoverflow.com/a/23616021/2898673 and we can pursue that in the future if/when we need to do this again.

EDIT: the curator created an account for us on https://sfts1.gov.ns.ca/ and sent us the encrypted file, and @ mguaypaq was able to download it and upload the contents to https://spineimage.ca. On Thursday, May the 18th, there was a final meeting to demonstrate to the Halifax curator that their work was done as far as it can be for now. If we need their help making edits to the data we will be right back here unless we figure out some kind of proxy situation.

neuropoly / data-management

Praxis Data Server (`https://spineimage.ca`) #77

Needs

Software Design

Global Architecture

Per-Site Architecture

Components

Data Servers

Uploading/Downloading

Versioning

Permissions

Sharing

Portal

Packaging

Software Development Tasks

Open Questions

Summary

Cost Estimates

Hardware

Networking

Sysadmin Access Control

Basic config

updates

unattended-upgrades

la la la

GIN

Modern Research Data Management for Neuroscience

Manage your research data

Share your data

Version your data

Open Source

I want to know more!

the port 2222 problem

@@ -13,41 +13,5 @@

- {{svg "octicon-flame"}} {{.i18n.Tr "startpage.install"}} -

- {{svg "octicon-device-desktop"}} {{.i18n.Tr "startpage.platform"}} -

- {{svg "octicon-rocket"}} {{.i18n.Tr "startpage.lightweight"}} -

- {{svg "octicon-code"}} {{.i18n.Tr "startpage.license"}} -

Meeting - site_03

User Documentation

Prod Plan: https://spinalimage.ca

Future dev plan: https://spineimage.ca.dev.neuropoly.org

Meeting - site_012

Backups

Meeting - site_012

Backups

Limitations

Alternatives

Backups

Ansible

Ubuntu Upgrade

Restore

Gitea Upgrade

Meeting - site_012

Meeting - site_012

`unattended-upgrades`