Closed bpoldrack closed 1 year ago
For a user, in this particular dataset, the errors manifest as follows:
A datalad clone
contains a variety of errors, some of them internal git-annex errors. The command also takes roughly 5 minutes (on my system), until it finally finishes with a "could not connect to server message":
❱ datalad clone git@github.com:psychoinformatics-de/studyforrest-data-visualrois.git
[INFO ] scanning for annexed files (this may take some time)
[INFO ] Unable to parse git config from origin
[INFO ] Remote origin does not have git-annex installed; setting annex-ignore
| This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
[INFO ] error: remote mddatasrc already exists.
[INFO ] git [Param "remote",Param "add",Param "mddatasrc",Param "http://psydata.ovgu.de/studyforrest/freesurfer/.git"] failed
[INFO ] RIA store unavailable. -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to establish a new session 1 times. -caused by- HTTPConnectionPool(host='studyforrest.ds.inm7.de', port=80): Max retries exceeded with url: /ria-layout-version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd55228fbd0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
[WARNING] Failed to fetch type=git special remote mddatasrc: CommandError(CommandError: 'git -c diff.ignoreSubmodules=none fetch --verbose --progress mddatasrc' failed with exitcode 128 under /tmp/studyforrest-data-visualrois [err: 'fatal: unable to access 'http://psydata.ovgu.de/studyforrest/visualrois/.git/': Failed to connect to psydata.ovgu.de port 80 after 131126 ms: Couldn't connect to server'])
^CERROR:
Interrupted by user while doing magic: KeyboardInterrupt()
datalad clone 8.48s user 2.60s system 4% cpu 4:13.21 total
Although the clone succeeds, and there is a worktree, datalad and git-annex operations appear to stall. Only in git-annex's debug output it is evident that the broken special remote is the cause:
❱ git annex -d -v get --from origin sub-01/rois/lEBA_2_mask.nii.gz 130 !
[2023-04-13 13:30:10.531453004] (Utility.Process) process [138807] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2023-04-13 13:30:10.534125464] (Utility.Process) process [138807] done ExitSuccess
[2023-04-13 13:30:10.534778431] (Utility.Process) process [138808] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2023-04-13 13:30:10.537371471] (Utility.Process) process [138808] done ExitSuccess
[2023-04-13 13:30:10.538040037] (Utility.Process) process [138809] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..8c48b9f3f5944e3a8bf2ef0e64a79683464d5d01","--pretty=%H","-n1"]
[2023-04-13 13:30:10.541055684] (Utility.Process) process [138809] done ExitSuccess
[2023-04-13 13:30:10.541644348] (Utility.Process) process [138810] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..9dd5b1b7d608b599d5c22056f7e232b9d34de7a1","--pretty=%H","-n1"]
[2023-04-13 13:30:10.544671337] (Utility.Process) process [138810] done ExitSuccess
[2023-04-13 13:30:10.545880754] (Utility.Process) process [138811] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
[2023-04-13 13:30:10.586119458] (Utility.Url) Request {
host = "psydata.ovgu.de"
port = 80
secure = False
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/10.20221003")]
path = "/studyforrest/visualrois/.git/config"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
proxySecureMode = ProxySecureWithConnect
}
This effects every source of the dataset.
The errors showing up for users are about several special remotes and their setup in this dataset.
1.) The most important issue is the original type git special remote mddatasrc
pointing to the no longer existing http://psydata.ovgu.de/studyforrest/visualrois/.git
. This needs to be changed to https://datapub.fz-juelich.de/studyforrest/studyforrest/visualrois/.git
. This special remote has the UUID 9536f86d-eb34-42ed-8ffc-fafd63a2b87e
2.) For some reason there's a second type git special remote registered under the same name mddatasrc
but pointing to http://psydata.ovgu.de/studyforrest/freesurfer/.git. This appears to be a mistake to begin with. At the very least it should be changed to autoenable=false
to avoid spamming users with misleading and irrelevant errors. I'd suggest to declare it dead in addition. This special remote has the UUID db2e8480-0894-4e67-93b3-28d0d64d629b
.
3.) The ORA special remote pointing to INM-7 is not publicly available but autoenabled. Technically, that's fine and will report the inability to enable it during clone. The message is at INFO level, indicating its nothing to worry about. The message itself, however, is reporting an error. Especially when other errors occur this is misleading. I think it's worth considering to not autoenable it.
Fixing 1), is a bit non-obvious because of the git-type special remote and its interaction with an actual git remote.
Right after clone, one can not simply use git annex enableremote mddatasrc location=NEWURL
, because git-annex would try to enable the git remote called mddatasrc
that was added to .git/config
during autoenabling of the git-type special remote. So, git-annex mistakes the name reference. If one uses the UUID instead, however, git annex enableremote
will try to git remote add
the respective git remote which already exists. Hence, this also fails. That's because enableremote
does not seem to consider itself being possibly used for reconfiguration rather than plain enabling in its internal flow. It is however the way to reconfigure. Therefore, what is required is:
git remote remove mddatasrc
git annex enableremote 9536f86d-eb34-42ed-8ffc-fafd63a2b87e location=https://datapub.fz-juelich.de/studyforrest/studyforrest/visualrois/.git
The second call will then reintroduce the git remote with the corrected URL locally. In addition, I am not sure whether the "right" git remote would be there for everyone on every system at this point, because of the second mddatasrc
. Removing any mddatasrc
git remote from .git/config
should work in any case, though.
WRT 2.) I'd suggest to
git annex dead db2e8480-0894-4e67-93b3-28d0d64d629b
git remote remove mddatasrc
git annex enableremote db2e8480-0894-4e67-93b3-28d0d64d629b autoenable=false
git remote remove mddatasrc
Considering the fix for 1.) above, this is more convenient to do first. The git remote remove
is done for the same reasons as in 1.)
WRT 3.):
This should be a matter of git annex enableremote inm7-storage autoenable=false
, but I feel this needs judgement by others.
So, overall:
clone from wherever
git annex dead db2e8480-0894-4e67-93b3-28d0d64d629b
git remote remove mddatasrc
git annex enableremote db2e8480-0894-4e67-93b3-28d0d64d629b autoenable=false
git remote remove mddatasrc
git annex enableremote 9536f86d-eb34-42ed-8ffc-fafd63a2b87e location=https://datapub.fz-juelich.de/studyforrest/studyforrest/visualrois/.git
Running an additional git annex fsck -f mddatasrc --fast
may be good, but if the content available from the new location is supposed to be identical it is not strictly necessary.
Ultimately, push, ofc.
One last point: The particular interaction between a git remote and a git-type special remote makes it a bit strange to use enableremote
for configuration change. This may be worth pointing out to Joey.
Here's a coordination issue for a collaborative fix: https://github.com/psychoinformatics-de/studyforrest-data/issues/62
Origin: https://github.com/psychoinformatics-de/studyforrest-data-visualrois/issues/6
TODO (not necessarily to be performed in this order)