Open kousu opened 3 years ago
@mguaypaq proposed today that these cruft remotes aren't just messy, they're a re-identification risk: because git-annex adds a comment to each remote with their user@hostname
, and a timestamp, it might be possible to retrace who touched each subject and figure out who they'd scanned.
git-annex
is 3 components afaict: a partial-download system, a bunch of plugins for using different kinds of URLs, and a content-tracking system on top of the two. I think the content-tracking system is a source of a lot of grief for us (e.g. #67). Another source of grief is that datasets implicitly record their paths whenever they are installed anywhere even if only temporarily. And then if they are eversync
ed back, they will infect the root dataset even without going through a pull request.For example: https://github.com/spine-generic/data-multi-subject/pull/77#issuecomment-818980337
I will never be able to connect to Alex's MacBook-Pro. This is a useless piece of information. And keeping it around makes handling merges harder and makes parsing through data harder.
You can see this in other published datasets too. For example, anything on openneuro:
That
openneuro-prod-dataset-worker-2
is an ephemeral built bot; no one except openneuro will ever be able to access it, and it should not be published, yet it is.I've been recommending
to work around this. The only copies that shouldn't have this are the ones on data.neuro.polymtl.ca or the ones on amazon and those get set automatically. Every working copy should have this, IMO.
git-annex was designed mainly as a personal Dropbox-like system, to corral many disks and cloud accounts into one big meta filesystem, whereas we're using it like we use the rest of git, with collaboration and forking, and these two models don't mesh well.