[DAT]: Add ENKI dataset

juaml / junifer

Forschungszentrum Jülich Neuroimaging Feature Extractor

https://juaml.github.io/junifer

GNU Affero General Public License v3.0

14 stars 13 forks source link

[DAT]: Add ENKI dataset #47

Open fraimondo opened 2 years ago

fraimondo commented 2 years ago

Which dataset is it?

Enki dataset for Juseless

Implementation

Did not implement anything

Your implementation

No response

Dataset access restrictions

[ ] Public and open access (no registration)
[ ] Public and open access (registration required)
[X] Restricted access (needs approuval)
[X] Available only in specific locations (Juseless, Jureca)

Anything else to say?

No response

synchon commented 2 years ago

@fraimondo Is this the eNKI one or the eNKI-pheno one?

LeSasse commented 2 years ago

We interpreted it as the eNKI processed data on juseless: https://github.com/juaml/junifer/tree/add_enki_datagrabber (datagrabber here)

synchon commented 2 years ago

@LeSasse Ah great, you have already started working on it. 👍

verakye commented 2 years ago

So far there is the anatomical and the BOLD data from the fmriprep output included. Do we also want to add the Freesurfer data?

LeSasse commented 2 years ago

One problem that we have identified also with the dataset at /data/project/enki/processed is that some files do not actually exist. They are shown by the file system, so the PatternDataGrabber thinks they are there, but the file cannot be accessed. One example is at /data/project/enki/processed/fmriprep/sub-A00054581/ses-CLG5/func/sub-A00054581_ses-CLG5_task-checkerboard_acq-1400_space-MNI152NLin6Asym_desc-preproc_bold.nii.gz.

Its some symlink that is shown by the file system so the datagrabber seems to find it, but then the test: assert out["BOLD"]["path"].exists() fails, because the file does not in fact exist?

fraimondo commented 2 years ago

@LeSasse: is there a subdataset in this dataset? Is that why the file is not there?

verakye commented 2 years ago

As far as I know it's not because of a sub dataset but because some files were removed because the brain scans didn't pass some quality controls. However the symlinks still exist. There was a discussion with the datalad people if the dataset should be "cleaned" but for now it sounded as if this wasn't planned for the near future.

fraimondo commented 2 years ago

Then there is not much to do, unless those subjects are manually excluded. Any project dealing with this dataset needs to exclude this subjects manually. You just can't go through all the data and check if a file exists (symlink points to the right place)

LeSasse commented 2 years ago

in that case i will leave the grabber as is and make a pull request for now.

LeSasse commented 2 years ago

As far as I know it's not because of a sub dataset but because some files were removed because the brain scans didn't pass some quality controls. However the symlinks still exist. There was a discussion with the datalad people if the dataset should be "cleaned" but for now it sounded as if this wasn't planned for the near future.

I believe the plan to "remove" the enki dataset has been followed through as I can not find the path /data/project/enki/processed anymore. @fraimondo @verakye

verakye commented 2 years ago

I also can't find it anymore. I am currently waiting for Alex' reply about it.

fraimondo commented 2 years ago

Then this issue will need to be on hold for the moment.

verakye commented 2 years ago

According to Alex the future of the dataset depends "on stakeholders putting in effort to declare what they want". He hasn't heard anyone mention anything concrete recently, so for now it seems to be unclear. Here (https://jugit.fz-juelich.de/inm7/datasets/datasets_repo/-/issues/71) is the history of the dataset where people would/should report what they want for the future, so we should be able to see there if things change/upcoming plans for this dataset for the future.