RF: nipype interface to datalad (eg. DataladGrabber)?

nipy / nipype

Workflows and interfaces for neuroimaging packages

https://nipype.readthedocs.org/en/latest/

Other

751 stars 530 forks source link

RF: nipype interface to datalad (eg. DataladGrabber)? #2191

Open oesteban opened 7 years ago

oesteban commented 7 years ago

I'm sure @yarikoptic has already some thoughts about this. I would like to gauge how many people would be interested in a Nipype interfaces that leverages datalad to get any data. I'm particularly thinking of a use case: fetching only one subject from one openfmri dataset.

satra commented 7 years ago

@oesteban - we do this for all our workshops now. not even one subject at times :)

there are few things we have contemplated with @yarikoptic.

automatic data grabbing by inserting datalad module which overwrites open.
a more explicit grabber.

and perhaps a combination of both ideas.

oesteban commented 7 years ago

We implemented a test battery that grabs random subjects from openfmri, and we are using datalad for this. I guess my proposal was basically option 2). I think we can allocate some time to this, so if someone starts with it let us know :D

yarikoptic commented 5 years ago

ATM I am using a kludge for a DataGrabber:

# assure that we have all input data
import datalad.api as dl
for v in files.outputs.trait_names():
    if not v.startswith('trait_'):
        dl.get(getattr(files.outputs, v))

So may be it is just that there could be a runtime option for nipype to make DataGrabber "datalad get" all the files it grabs? It would add a slight delay since datalad would need to check those out, but it might be very well worth it, more targeted (although not as functional) than 1, and more seamless than 2 ;-)

satra commented 5 years ago

@yarikoptic - a few comments/questions on this:

perhaps this can be done at the File trait level? is there a datalad api to check if a given path is a datalad file?
if a dataset is mounted readonly into a container, how would this work?
what if there are multiple locations/datalad datasets for files?
the option to not using File trait is to consider extending the glob process, which is a subfunction, and adding an optional input (e.g., datalad_fetch=True)

oesteban commented 5 years ago

perhaps this can be done at the File trait level? is there a datalad api to check if a given path is a datalad file?

+1 to handle this at trait level.

2. if a dataset is mounted readonly into a container, how would this work?

Hardly - or if you are using singularity, which is read-only by default. Fortunately, we are dealing with these problems in TemplateFlow and found some compromises.

3. what if there are multiple locations/datalad datasets for files?

What do you mean?

satra commented 5 years ago

what if there are multiple locations/datalad datasets for files? What do you mean?

don't know if you can do a datalad get from an absolute path from an arbitrary location.

effigies commented 5 years ago

don't know if you can do a datalad get from an absolute path from an arbitrary location.

Yes, if you do datalad get with an absolute path, it will find the appropriate git root and ensure that file is available at that path. Or am I misunderstanding?