Open adswa opened 1 year ago
Here's a log of me doing it for studyforrest-data-aligned
:
(handbook) adina@muninn in /tmp
❱ datalad clone git@github.com:psychoinformatics-de/studyforrest-data-aligned.git
[INFO ] Unable to parse git config from origin
[INFO ] Remote origin does not have git-annex installed; setting annex-ignore
[INFO ] This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
[INFO ] RIA store unavailable. -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to establish a new session 1 times. -caused by- HTTPConnectionPool(host='studyforrest.ds.inm7.de', port=80): Max retries exceeded with url: /ria-layout-version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe08f3af410>: Failed to establish a new connection: [Errno -2] Name or service not known'))
^CERROR:
Interrupted by user while doing magic: KeyboardInterrupt()
(handbook) adina@muninn in /tmp
❱ cd studyforrest-data-aligned 3 !
(handbook) adina@muninn in /tmp/studyforrest-data-aligned on git:master
❱ ls
code LICENSE src sub-02 sub-04 sub-06 sub-10 sub-15 sub-17 sub-19
datacite.yml README.md sub-01 sub-03 sub-05 sub-09 sub-14 sub-16 sub-18 sub-20
(handbook) adina@muninn in /tmp/studyforrest-data-aligned on git:master
❱ git cat-file -p git-annex:remote.log
77730816-fef8-459d-9c1c-3bb46a20fe0e archive-id=c8ec2919-493b-4af5-9271-cbe9ebd08c43 autoenable=true encryption=none externaltype=ora name=inm7-storage push-url=ria+ssh://bulk1.htc.inm7.de/ds/studyforrest/srv type=external url=ria+http://studyforrest.ds.inm7.de timestamp=1620023916.544174369s
7dd5970d-cee5-404e-a3be-6430ec03657f autoenable=true location=http://psydata.ovgu.de/studyforrest/aligned/.git name=mddatasrc type=git timestamp=1453280984.013246s
(handbook) adina@muninn in /tmp/studyforrest-data-aligned on git:master
❱ git remote remove mddatasrc
(handbook) adina@muninn in /tmp/studyforrest-data-aligned on git:master
❱ git annex enableremote 7dd5970d-cee5-404e-a3be-6430ec03657f location=https://datapub.fz-juelich.de/studyforrest/studyforrest/aligned/.git
enableremote 7dd5970d-cee5-404e-a3be-6430ec03657f ok
(recording state in git...)
(handbook) adina@muninn in /tmp/studyforrest-data-aligned on git:master
❱ datalad get sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold_mcparams.txt
get(ok): sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold_mcparams.txt (file) [from mddatasrc...]
(handbook) adina@muninn in /tmp/studyforrest-data-aligned on git:master
❱ datalad push
publish(ok): . (dataset) [refs/heads/git-annex->origin:refs/heads/git-annex 304f2250..3a9c6331]
action summary:
publish (notneeded: 1, ok: 1)
(handbook) adina@muninn in /tmp/studyforrest-data-aligned on git:master
❱ cd ..
(handbook) adina@muninn in /tmp
❱ datalad clone git@github.com:psychoinformatics-de/studyforrest-data-aligned.git t
[INFO ] Unable to parse git config from origin
[INFO ] Remote origin does not have git-annex installed; setting annex-ignore
[INFO ] This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
[INFO ] RIA store unavailable. -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to establish a new session 1 times. -caused by- HTTPConnectionPool(host='studyforrest.ds.inm7.de', port=80): Max retries exceeded with url: /ria-layout-version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f60bec93810>: Failed to establish a new connection: [Errno -2] Name or service not known'))
install(ok): /tmp/t (dataset)
(handbook) adina@muninn in /tmp
❱ cd t
(handbook) adina@muninn in /tmp/t on git:master
❱ datalad get sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold_mcparams.txt
get(ok): sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold_mcparams.txt (file) [from mddatasrc...]
Thanks @adswa for the great instructions!
I have changed the mddatasrc
location in https://github.com/psychoinformatics-de/studyforrest-data-phase2, and datalad get
works now.
I get the same [INFO]
-message about an unavailable RIA store, which I assume is OK, right?
Yes, this message is unrelated to the special remote 👍
This is what I get for https://github.com/psychoinformatics-de/studyforrest-data-aggregate.
Two git-annex remotes, nothing about mddatasrc
, access errors:
❱ datalad clone https://github.com/psychoinformatics-de/studyforrest-data-aggregate.git
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] https://github.com/psychoinformatics-de/studyforrest-data-aggregate.git/config download failed: Not Found
[INFO ] RIA store unavailable. -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to establish a new session 1 times. -caused by- HTTPConnectionPool(host='studyforrest.ds.inm7.de', port=80): Max retries exceeded with url: /ria-layout-version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f97dccd2ad0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
[WARNING] Failed to fetch type=git special remote psydata: CommandError(CommandError: 'git -c diff.ignoreSubmodules=none fetch --verbose --progress psydata' failed with exitcode 128 under /Users/jsheunis/Documents/psyinf/Data/studyforrest-data-aggregate [err: 'fatal: unable to access 'http://psydata.ovgu.de/studyforrest/aggregate/.git/': Failed to connect to psydata.ovgu.de port 80: Operation timed out'])
install(ok): /Users/jsheunis/Documents/psyinf/Data/studyforrest-data-aggregate (dataset)
❱ git cat-file -p git-annex:remote.log
11d89be1-d3e3-4803-8ba2-c168411b4e80 autoenable=true location=http://psydata.ovgu.de/studyforrest/aggregate/.git name=psydata type=git timestamp=1511528268.553343287s
b1cffbff-ef07-4f22-a736-53d92eeb2c7a archive-id=7fcd8812-d0fe-11e7-8db2-a0369f7c647e autoenable=true encryption=none externaltype=ora name=inm7-storage push-url=ria+ssh://bulk1.htc.inm7.de/ds/studyforrest/srv type=external url=ria+http://studyforrest.ds.inm7.de timestamp=1620022089.160920561s
❱ datalad get sub-01/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz
get(error): sub-01/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz (file) [Remote psydata not usable by git-annex; setting annex-ignore
http://psydata.ovgu.de/studyforrest/aggregate/.git/config download failed: ConnectionFailure Network.Socket.connect: <socket: 19>: timeout (Operation timed out)]
Thx! In this case, the remote is not called mddatasrc
but psydata
- can you replace all comments with mddatasrc
with psydata
? I believe this should do the trick. Thank you so much! :)
https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms (fixed, but some get "impossible" and "errors" remain. See datalad get sub-01)
I'm investigating :+1:
Thx! In this case, the remote is not called
mddatasrc
butpsydata
- can you replace all comments withmddatasrc
withpsydata
? I believe this should do the trick. Thank you so much! :)
enabled the new remote with the correct location, but still getting errors when retrieving file content:
> git remote remove psydata
> git annex enableremote 11d89be1-d3e3-4803-8ba2-c168411b4e80 location=https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/.git
enableremote 11d89be1-d3e3-4803-8ba2-c168411b4e80 ok
(recording state in git...)
> datalad get sub-01/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz
get(error): sub-01/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz (file) [download failed: Not Found
failed to download content
download failed: Not Found
failed to download content
download failed: Not Found
failed to download content]
with debug
:
[DEBUG ] received JSON result from annex: {'command': 'get', 'error-messages': [' download failed: Not Found', ' failed to download content', ' download failed: Not Found', ' failed to download content', ' download failed: Not Found', ' failed to download content'], 'file': 'sub-16/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz', 'input': ['sub-16/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz'], 'key': 'MD5E-s40787--8ade422c9c0d49788f3b6ad793f81b9c.nii.gz', 'note': 'from psydata...\nUnable to access these remotes: psydata\n(Note that these git remotes have annex-ignore set: origin)', 'success': False, 'wanted': [{'description': 'mih@meiner:~/forrest/collection/aggregate', 'here': False, 'uuid': '11d60f59-d220-4b86-9b84-7fdcfe6937c7'}, {'description': '', 'here': False, 'uuid': '272356f8-65a0-4ba5-a217-1c7ebb97903d'}, {'description': 'mih@medusa:/home/data/psyinf/forrest_gump/collection/aggregate', 'here': False, 'uuid': '2d364a44-eb57-4a88-9c75-a1a22fbabfeb'}, {'description': 'inm7-storage', 'here': False, 'uuid': 'b1cffbff-ef07-4f22-a736-53d92eeb2c7a'}, {'description': 'git@82709b2ed170:/data/repos/studyforrest/aggregate-fmri-timeseries.git', 'here': False, 'uuid': 'f2cd7b91-6ce6-490f-b2dd-21bce9b90b6b'}]}
Thx, I will investigate and report back what I found! :+1:
Edit: This was fixed and pushed. Done!
First observation about https://github.com/psychoinformatics-de/studyforrest-data-aggregate: Some files on datapub.fz-juelich.de seem access-restricted. I don't know what to do here, so I'll tag @aqw and @mih for potential insights
It affects the following files: https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/atlases/shen/fconn_atlas_150_1mm.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/atlases/shen/fconn_atlas_150_2mm.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-01/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-02/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-03/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-04/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-05/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-06/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-09/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-10/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-14/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-15/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-16/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-17/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-18/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-19/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz https://datapub.fz-juelich.de/studyforrest/studyforrest/aggregate/sub-20/atlases/bold3Tp2/shen_fconn_atlas_150.nii.gz
Edit: I only now realized that this were all annexed files in the dataset, the rest is in Git
EDIT: The problem is that the dataset on https://datapub.fz-juelich.de/studyforrest/studyforrest/templatetransforms/.git/ contains an old version, with the last commit from 2016. The dataset on GitHub has more recent commits. They seem to originate from juseless, but other than the commits, these changes were not published. If we push this dataset from data1:/data/project/studyforrest/superds/derivative/image_space_transformations to datapub, this should get fixed. I don't have permissions to do this.
For https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms I also need some help, so I'm tagging @mih and @bpoldrack:
There are files that can't be retrieved, e.g., sub-01/bold3Tp2/in_t1w/brain.nii.gz sub-01/bold3Tp2/in_t1w/xfm_6dof.mat sub-01/t1w/in_bold3Tp2/brain.nii.gz sub-01/t1w/in_bold3Tp2/xfm_6dof.mat
This is the availability information registered for those files (exemplary for one, matches all of them) - the important bit is that the enabled [mddatasrc]
special remote isn't listed.
❱ git annex whereis sub-01/bold3Tp2/in_t1w/xfm_6dof.mat
whereis sub-01/bold3Tp2/in_t1w/xfm_6dof.mat (4 copies)
43613943-720c-4018-a7a5-40c6fb9ad603 -- inm7-storage
529fccea-fdf5-4266-99a4-769e2638f82f -- mih@medusa:/home/data/psyinf/forrest_gump/collection/tnt
a6358f69-bae7-4035-a9b8-7751eb3d9144 -- git@82709b2ed170:/data/repos/studyforrest/imagespace-transformations.git
f2ec3af6-e466-4951-b1dc-4991ade8f171 -- mih@data1:/data/project/studyforrest/superds/derivative/image_space_transformations
ok
However, the files are available at mddatasrc
, for example https://datapub.fz-juelich.de/studyforrest/studyforrest/templatetransforms/sub-01/bold3Tp2/in_t1w.
I already did an git annex fsck --from mddatasrc
which reported success, but did not update availability. My question is: how can I tell git-annex that for those files in question mddatasrc
is a suitable location, too? Is it a job for addurls
?
A side question is whether those files are left unregistered on purpose, e.g., because of data privacy.
As for https://github.com/psychoinformatics-de/studyforrest-data-phase2-denoised , we don't have this data, all sources are with OpenNeuro as far as I can see.
Edit: The dataset here on github is outdated. The problem is that the data was updated upstream, and the content from the now unavailable files was moved to *_decomposition.json in commit de145f67a3da26f1d39187403340d7380d928cf2 tag 1.3.0. ~I will get the dataset in sync with the one from OpenNeuro~ After a quick discussion in the chat, we decided to add a fork to the OpenNeuro Dataset instead of synching.
A quick overview of a TODO for @mih:
Please go to data1:/data/project/studyforrest/superds/derivative/image_space_transformations
(I hope this still exists)
Check the git history of the dataset. The last commit registered at mddatasrc
is 424d152. Here's the log of what happened in the mean time on GitHub (https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms/commit/424d125221e0ff094e30031e6290aa251cb30ad2 is at the bottom):
0afa47d7 (HEAD -> master, origin/master, origin/HEAD) fix a typo
e5fba37e Merge pull request #3 from psychoinformatics-de/christian-monch-patch-1
19c5dd32 Update datacite.yml
c3abaf15 Add files via upload
688d8d85 Merge remote-tracking branch 'github/master'
00f37947 Merge pull request #2 from loj/README
164feb82 DOC: convert README to markdown
c5b30550 [DATALAD] new dataset
2d467607 Merge remote-tracking branch 'github/master'
f3db608f Saving the result of a rerun after code change in d913360f290a7bdadfe6>
89035477 Merge pull request #1 from adswa/README
3e536c9a DOC: Add short DataLad intro as proposed in the handbook
424d1252 (mddatasrc/master) Add README
If things look kosher to you, please push to mddatasrc
at https://github.com/psychoinformatics-de/studyforrest-data-templatetransforms/commit/424d125221e0ff094e30031e6290aa251cb30ad2
@adswa
I already did an git annex fsck --from mddatasrc which reported success, but did not update availability.
That is strange. It should update availability if there was a change and it would be the way to go. When you say "reported success", do you mean a zero exit of the command or that it reported to find those files via special remote?
See my edit in that post, and most recent comment to @mih with a fix, @bpoldrack: The files in question differ in version between GitHub and datapub.
Datapub is outdated. It does not know the updated annex keys that GitHub knows about (but doesn't carry). So while a file on GitHub points to annex/objects/zW/...
, datapub does not have this in its object tree yet (because this version of the file wasn't pushed to it yet, it only lives on data1
). I figured this when I tried to run git annex setpresentkey
to manually add the mddatasrc
to the key.
Another TODO for @mih:
I lack the permissions to do so, and this dataset is superfluous as I have forked the openneuro dataset as discussed in the chat as a maintained alternative to https://github.com/psychoinformatics-de/studyforrest-data-phase2-denoised_openneuro
TODO for me:
studyforrest-data
superdataset~ The dataset wasn't a subdataset of studyforrest-data
it seems
At the moment, the Studyforrest datasets hosted here on GitHub are all broken. The reason for this is a faulty special remote
mddatasrc
pointing topsydata.ovgu.de
, which used to redirect todatapub.fz-juelich.de
(where the data was migrated to), but was taken down recently. The first user issue that brought this problem to light is https://github.com/psychoinformatics-de/studyforrest-data-visualrois/issues/6.Although I've only probed a handful of repositories/subdatasets in this repo, I believe they all have a now broken
mddatasrc
special remote registered. I suggest we put in a coordinated effort to fixing this with as many people as possible. @bpoldrack outlined a fix for this issue in https://github.com/psychoinformatics-de/studyforrest-data-visualrois/issues/6. Here's my translation for the general procedure that anyone can follow:mddatasrc
during cloning. If not, nevertheless try to retrieve data to make sure it all works. If everything works, move to the next dataset; if not, move to 3.remote.log
and make sure there is only onemddatasrc
special remote (git cat-file -p git-annex:remote.log
is the command to do it). If there are two, leave a note, and move to the next dataset for now.mddatasrc
special remote inremote.log
/.git
mddatasrc
usinggit remote remove mddatasrc
mddatasrc
using its UUID as an identifier, and the URL you constructed from datapub.fz-juelich.de (see example below) to fix the location information:datalad get
to confirm that this fix worked, and retrieval frommddatasrc
is possible againdatalad push
the changes back to GitHub. There is no need (or possibility) to do a pull request. Make sure that the git-annex branch gets successfully pushed. If you run into permission errors, seek help in the chat.List of repositories:
mddatasrc
error)mddatasrc
error)mddatasrc
error; connection errors duringclone
andget
)~ fixed!mddatasrc
error)mddatasrc
, leave for later!datalad get sub-01
)mddatasrc
error; getting.gz
files works, but.txt
files look like availability was never pushed. Seesub-05/ses-movie/func/sub-05_ses-movie_task-movie_run-1_desc-MELODICSm5_componentLabels.txt
) -> this dataset was an external contribution, and has been replaced by a fork of the corresponding maintained openneuro dataset (https://github.com/psychoinformatics-de/studyforrest-data-phase2-denoised_openneuro)