spine-generic / data-multi-subject

Multi-subject data for the Spine Generic project
Creative Commons Attribution 4.0 International
22 stars 15 forks source link

Grant contributors Amazon S3 access for PRs #75

Open kousu opened 3 years ago

kousu commented 3 years ago

git-annex makes it difficult to use multiple annexes, meaning the fork-and-pull-request model of contribution is awkward: a contributor would need to find their own annex hosting and temporarily add it (globally!) to the repo's git-annex metadata, and then the person who accepts the PR would need to remember to git annex copy --from contributor; git annex copy --to amazon everything.

So, we're going to cut a corner: we will grant contributors access to our S3 bucket directly. They'll still have to do a PR, but the PR will have already written to the S3 annex.

This thread will document how to go about doing this, and should become a part of #1.

Permissions

We don't want to grant full access, so the process needs to at least grant a restricted access token to our users.

Moreover, ideally our data would be write-once, since we want to archive it for SCIENCE; luckily it looks like recently S3 has gained full support for this; I am unsure how to integrate it with git-annex, though: to use this, you MUST turn it on at bucket creation time (e.g. awscli s3api create-bucket --bucket data2--spine-generic--neuropoly --object-lock-enabled-for-bucket --create-bucket-configuration LocationConstraint=ca-central-1), except I'm pretty sure git-annex insists on creating the bucket. So maybe there's going to be some problems there.

Except that some files that need further editing during review will end up leaving detritus all over the bucket. It won't be much but it'll be some. We can counteract that by writing scripts to compare what's in the repo with what's in the bucket to find orphaned files, and then use an account with extra permissions to clean out those files.

kousu commented 3 years ago

I have to do this for someone sometime in the next couple of weeks. I'll take screenshots as I go (censoring the sensitive parts).