Closed bpinsard closed 3 years ago
No, that is not expected. Is the 3dQwap
tool actually running on the node or you just checked on the output log?
3dQwarp was actually running on the node (job timed-out), high CPU usage and all. I will check what is left in the workdir, the input files etc.
Ran the same 3dQwarp command outside of singularity with neurodebian afni on my laptop, completes quickly. Ran the same command inside the singularity on my laptop, completes quickly. Ran the same command inside the singularity on the cluster, completes quickly, but this was without openmp (as set in Dockerfile). Ran the same command inside the singularity on the cluster with OMP_NUM_THREADS=8 , got the following error, but still completes faster as expected.
skipping - powell_newuoa_con() failure code=-1
+ powell_newuoa_con( ndim=16 x=0x2921f90 xbot=0x29229e0 xtop=0x2922a70 nrand=0 rstart=0.444000 rend=0.003996 maxcall=159 ufunc=0x49c3a0
I see two possibilities:
--resource-monitor
to evaluate the resources needed on a subset of the dataset to then heuristically request them more accurately from SLURM. I have no idea how this works in nipype, but I can imagine that the processes might have some inter-dependencies because one is monitoring the other.
I will resubmit the pipeline without --resource-monitor
to see if it completes.Launched it without --resource-monitor
, it is slow but progressing (strace on the node show regular progress outputs from 3dQwarp on stderr). It could be slow due to multiple 3dQwarp with OMP_NUM_THREADS=8 running concurrently with n_cpus=8, because the subject has multiple BOLD runs.
It is even slower than running with OMP_NUM_THREADS=1. At the current pace, it might reach 50+hours.
In parallel, I launched a single 3dQwarp node through nipype (load .pklz
, run) in singularity. The processing speed is normal, completes.
I reran the pipeline setting -n_cpus=16
and --omp_nthreads=4
and it completed in a reasonable amount of time.
I am not familiar with OpenMP programming, but it seems that concurrent processes each using the max number of cpu threads, a lot of time might be spent in context switching.
Is the problem that we're just not marking 3dqwarp as a multithreaded node?
It seems that nprocs is specified. https://github.com/nipreps/sdcflows/blob/c3ffa59418303854f8ca9d1ff1c2a7f22978e643/sdcflows/workflows/pepolar.py#L123-L126
Yeah, you're right. Sorry, not able to attend closely rn.
Should we not allow 3dQwarp to access more than 4 CPUs? It's not clean at all, but it will preempt this from happening, it seems.
So qwarp_nprocs = min(omp_nthreads, 4)
? I don't see that as inelegant, if 3dQwarp can't usefully take more.
Using fmriprep 20.1.1 (setup.cfg show
sdcflows ~= 1.3.1
) in singularity on cluster, the qwarp step of pepolar pipeline has now been running for more than 50 hours on multiple jobs/runs of the pipelines on data from the same dataset.Is this expected to be that long? If not what can I do to diagnose the problem?
Thanks!