psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

unable to drop sourcedate after hirni-spec2bids #183

Closed pvavra closed 3 years ago

pvavra commented 3 years ago

I want drop all the files (dicoms) which were retrieved during the conversion from bids/sourcedata, but fail to do so.

datalad create bids
datalad run-procedure -d bids cfg_bids
datalad create sourcedata
datalad run-procedure -d sourcedata cfg_hirni
datalad hirni-import-dcm -d sourcedata some_dicoms.tar acq1
datalad install -d bids -r -s sourcedata
datalad get -d bids/sourcedata -r
datalad hirni-spec2bids -d bids sourcedata/acq1/studyspec.json
# all works fine till here

A simple datalad drop . from inside bids/sourcedata tells me that the dicoms where dropped, but running du -sh still reports 4.8GB.

Now, this is related to a previous issue, so I also tried:

cd acq1/dicoms
git annex drop --force --branch origin/incoming

Where I changed the branch name to reflect that this is an installed dataset (used git branch -a to verify that an incoming branch does not exist).

Now we are down to 2.5GB.

The remaining files are located in .git/datalad/tmp/ and are the unpacked dicoms. What is the proper way of dropping these files? Since it's inside the .git folder, I hesitate to simply run a rm -r on that..

My goal is to get back to the same state I would have after a fresh datalad install ... call.

Note that I explicitly do not want to uninstall this dataset. In the real setting, I've set up the above for an ongoing project where I will constantly be adding new acquisitions to sourcedata as more data is collected. I want to propagate them down into the bids folder. However, data collection is slow atm, and I do not want to waste disk space with the cloned files in the meantime.,

bpoldrack commented 3 years ago

This looks like a bug. .git/datalad/tmp is named tmp for a reason and should have been deleted. Need to have a closer look, but meanwhile it's okay to rm -rf .git/datalad/tmp. This doesn't destroy anything.

bpoldrack commented 3 years ago

So, the first part is probably answered by #182, while second part is about the left over .git/datalad/tmp.

Turns out this is actually default behavior of the datalad-archive special remote, that I forgot about. The proper way to clean up ATM is to use the datalad clean command. However, I do consider this to be a bad choice for default behavior, so I opened a corresponding issue in datalad, since this isn't really about hirni itself. Thus closing here. Feel free to reopen, @pvavra if you feel that I missed an aspect.

pvavra commented 3 years ago

Just to document this in one place: one needs the following three commands to drop all files of a single acquisition under bids/:

# from within an acquisition's dicom folder:
cd path_to_bids/sourcedata/acq1/dicoms

# drop "incoming" branch
 git annex drop --force --branch origin/incoming

# drop working-tree version
git annex drop * -q

# clean up .git/datalad/tmp
datalad clean