fmriprep docker container stalling at "func_preproc_task_rest_wf.bold_std_trans_wf.bold_reference_wf.gen_ref" step

Heechberri commented 3 years ago

Hi all,

I have been trying to trouble shoot this problem for a week now, and would like some input from the experts.

I am running one subject as a test for a up coming pipeline using the following commands:

docker run -ti \ -v $disk/$project_dir/$bids_dir:/data:ro -v $disk/$project_dir/$preprocessed_dir:/fMRI \ -v $freesurfer_licence:/opt/freesurfer/license.txt \ -v $plugin_file:/plugin.yml \ -v $disk/$project_dir/$preprocessed_dir/scratch:/scratch \ poldracklab/fmriprep:latest \ /data /fMRI/fMRI participant \ --work-dir /scratch \ --error-on-aroma-warnings --use-aroma \ --output-spaces MNI152NLin6Asym:res-2 MNI152NLin6Asym:res-1 MNI152NLin2009cAsym:res-2 MNI152NLin2009cAsym:res-1 \ --fd-spike-threshold 0.5 \ --dvars-spike-threshold 20 \ --use-plugin /plugin.yml \ --dummy-scans 4 \ --write-graph \ --n_cpus '4' \ --nthreads 2 \ --omp-nthreads 4 \ --mem 20GB \ --low-mem \ -vvvv \ --resource-monitor

and the process has been stalling at the step:

[LegacyMultiProc] Running 1 tasks, and 3 jobs ready. Free memory (GB): 19.00/20.00, Free processors: 1/2. Currently running:

fmriprep_wf.single_subject_EXIC0001T_wf.func_preproc_task_rest_wf.bold_std_trans_wf.bold_reference_wf.gen_ref

for some time now. I have re-ran the docker container a couple of times both with fresh directories and resuming from scratch directory. I have waited from an overnight to 3 days , but it always seems to stall at the above mentioned step. Freesurfer has completed (from the recon-all.log in the freesurfer scripted directory) in expected time, but the functional outputs are stalling at this step. Correct me if I am wrong, given what I have read, I assume the rest of the steps outside of freesurfer should complete within 12 hours, even with the limited resources that I am using (docker allocation: CPU 5, mem 25GB, Swap:3.5, Disk:320GB).

I suspect that the problem has to do with resource allocation thus after reading through some of the post here and neurostars, I decided to add the following memory management flags in fmriprep:

--use-plugin /plugin.yml \ --n_cpus '4' \ --nthreads 2 \ --omp-nthreads 4 \ --mem 20GB \ --low-mem \

Still fmriprep has been stalling for the past 12 hours. Previously I have also tried running without the nthreads flag at 4 threads and it stalled overnight as well. Is there anything else I can do?

My images are multiband and attached is the full terminal log with flag -vvvv.

In the terminal log file, I only copied in the first few instances of "cannot allocate job..." because the flag -vvvvprints out too many debugging messages. I have left the process to run for more than 12 hours . It has been printing out "cannot allocate job..." every few seconds since then.

Also, I do not have any bids errors.

Thank you!

terminal log.txt

effigies commented 3 years ago

I'm looking at this and not seeing anything obviously wrong. I am a bit confused by some of these choices:

--use-plugin /plugin.yml --n_cpus '4' --nthreads 2 --omp-nthreads 4 --mem 20GB --low-mem \

--n_cpus and --nthreads are the same parameter. Looking at your output, --nthreads 2 won and you can only use up to two cores, but --omp-nthreads means that multi-threaded jobs will claim four cores. My guess is what's happening is that any job that actually tries to use 4 cores can't be scheduled since you say you only have 2.

I would recommend not setting --n_cpus, --nthreads or --omp-nthreads. The default will be to use 5 cores and up to 4 per job.

Heechberri commented 3 years ago

Thanks for the recommendations, I will stop the current run, change the inputs and re-run according to this, will update if it gets done :)

claytonjschneider commented 2 years ago

@Heechberri any update to this issue? I am running into an error on the same step, and I suspect it has to do with resource allocation. Whereas most people seem to run FMRIPREP on a cluster, I'm running it on a lab server and having difficulty electing where to limit resources to parallelize single-subject runs.

nipreps / fmriprep

fmriprep docker container stalling at "func_preproc_task_rest_wf.bold_std_trans_wf.bold_reference_wf.gen_ref" step #2476