khanlab / hippunfold

BIDS App for Hippunfold (automated hippocampal unfolding and subfield segmentation)
https://hippunfold.readthedocs.io
MIT License
47 stars 12 forks source link

Temporry files writen to home directory #308

Open JosePMarques opened 2 weeks ago

JosePMarques commented 2 weeks ago

Dear Hippunfold developers, thanks for this great tool. I am using the singularity version to run a large BIDS organized study. I am currently calling hippunfold the following way:

Specify the path to the subject folders

subject_path=/project/3022055.01/POM/derivatives/SEPIA

Specify the output path (of the Hippunfold output output)

output_path=/project/3032001.02/derivatives/Hippunfold

Hippunfold sif file

sif_file=/project/3032001.02/bids/code/hippunfold_1.4.1.sif

export SINGULARITY_CACHEDIR=${output_path}/.cache/singularity export SINGULARITY_BINDPATH=${output_path}:${output_path} export HIPPUNFOLD_CACHE_DIR=${output_path}/.cache/hippunfold/

singularity run --cleanenv -e --bind :/tmp ${sif_file} ${subject_path}/${sub_name} ${output_path} participant --modality T1w --wildcards-T1w sub-{subject}_ses-{session}_T1w.nii.gz --cores all --force_output

still, it keeps writing many files to our home directory ~: sub-POMU587AF573977E1E6F_ses-POMVisit1_dir-IO_hemi-R_space-corobl_desc-all_mask_p2l-surf.nii.gz sub-POMU587AF573977E1E6F_ses-POMVisit1_dir-IO_hemi-R_space-corobl_desc-SRLM_mask_p2l-surf.nii.gz sub-POMU587AF573977E1E6F_ses-POMVisit1_dir-IO_hemi-R_space-corobl_desc-SRLM_mask_p2l-surf_layering-boundaries.nii.gz sub-POMU587AF573977E1E6F_ses-POMVisit1_dir-IO_hemi-R_space-corobl_desc-SRLM_mask_p2l-surf_layering-depth.nii.gz sub-POMU587AF573977E1E6F_ses-POMVisit1_dir-IO_hemi-R_space-corobl_desc-SRLM_mask_p2l-surf_layering-layers.nii.gz ulevel.nii.gz

as well as ~/.cache/hippunfold/ and ~/.cache/snakemake

this creates problems with disk space as we have limited disk quota on our home directory. How can we make sure that these files get run into the output folder or some subject specific output folder in the server?

As we run this code in parallel, how can we know that the file ulevel.nii.gz is used for the right subject?

Thanks in advance for your help,

José

akhanf commented 1 week ago

Hi José,

There are some steps in the workflow right now that do seem to be writing temp files to the output root folder (these are the ones you are pointing you), but they shouldn't cause any conflicts as the names should be unique across a dataset. But having them appear in your home directory isn't the expected behavior.

Might be a few different things going on here -- can't try to reproduce the bug as I don't have my laptop with me this week, but will try to help in any case.

Normally all output (except for unet models and other downloaded resources) will go in the output folder, not your home directory. The .snakemake folder is also supposed to end up in the output folder, so if you are seeing that folder in your home directory then I'm wondering if something weird is happening with your bind paths (eg if the output folder isn't bound for some reason then it is defaulting to your home dir, but I haven't seen that actually happen before).

There are some files that are expected to go in your home directory if not otherwise specified, but that can be overriden with the HIPPUNFOLD_CACHE_DIR env var.

To help debug could you provide the full output log (stderr and stdout), as well as add some commands to provide some debug statements before running hippunfold, specifically to check the env vars and check paths of input and output and binding.

env singularity exec <same set of Singularity options you use> ls -l <path to input>

Also, not sure it makes a difference but the wildcards-T1w option isn't being used correctly, but you could leave it out (will parse Subject and session by default). Also not sure why you have a separate bids dataset for each subject, is that intentional? Ali
MariankaRinner commented 1 week ago

Hi Ali,

Thank you for your suggestions. I'm trying to use hippunfold with José for my master's project.

We intentionally run each subject separately due to wall time limitations. This way, several subjects can run in parallel.

I've looked at the binding of the HIPPUNFOLD_CACHE_DIR (and the SINGULARITY_CACHEDIR and SINGULARITY_BINDPATH). They were not binding correctly at first, but now they are. Apart from that, removing --wildcards-T1w gives the following error:

Traceback (most recent call last): File "/opt/conda/bin/hippunfold", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.9/site-packages/hippunfold/run.py", line 20, in main app.run_snakemake() File "/opt/conda/lib/python3.9/site-packages/snakebids/app.py", line 217, in run_snakemake snakemake.main( # type: ignore File "/opt/conda/lib/python3.9/site-packages/snakemake/init.py", line 3141, in main success = snakemake( File "/opt/conda/lib/python3.9/site-packages/snakemake/init.py", line 570, in snakemake update_config(overwrite_config, load_configfile(f)) File "/opt/conda/lib/python3.9/site-packages/snakemake/io.py", line 1720, in load_configfile config = _load_configfile(configpath) File "/opt/conda/lib/python3.9/site-packages/snakemake/io.py", line 1694, in _load_configfile obj = open(configpath_or_obj, encoding="utf-8") FileNotFoundError: [Errno 2] No such file or directory: 'sub-{subject}_ses-{session}_acq-GREhighres_run-1_T2star_denoise_S0map_T1w.nii.gz'

Furthermore, replacing singularity run with singularity run gives: FATAL: is a directory. However, just running

singularity exec --cleanenv -e ${sif_file} ls -l ${subject_path}

works, but not with adding the singularity options, then the Fatal error occurs.

If the program errors during the running progress, this is often in "Error in rule equivolume_coords:", regarding the file sub-POMUF61E7C93AFFF6F01_ses-POMVisit1_dir-IO_hemi-R_space-corobl_desc-SRLM_mask.nii.gz. The files saved in the home directory were "...dir-IO_hemi-R_space-corobl_desc-SRLM_mask....nii.gz files as well, could these things be related?

Thank you in advance, Marianka