Open JRJacoby opened 2 years ago
So eventually it continued with"
211215-15:57:08,660 nipype.workflow DEBUG:
adding multipath trait: segmentation_file
211215-15:57:08,667 nipype.workflow DEBUG:
adding multipath trait: summary_file
211215-15:57:10,591 nipype.workflow DEBUG:
[Node] Setting 1 connected inputs of node "segstats" from 1 previous nodes.
211215-15:57:10,673 nipype.workflow DEBUG:
Outputs object of loaded result /autofs/vast/citadel/studies/hcpa/users/john/analyses/12_14_2021_HCAP_WM_segmentations/outputs/nipype_base_dir/ATT_WM_segstats/vol2vol/result_vol2vol.pklz is a Bunch.
211215-15:57:10,685 nipype.workflow DEBUG:
output: transformed_file
211215-15:57:10,695 nipype.workflow DEBUG:
So it looks like it was in fact doing something that whole time. Does anyone know what? The next MapNode (segstats) is now stuck in the same way - all the subnodes ran and then the main segstats node itself is running as a job. What's that job doing? It's been running much longer than a segstats command should take, all while outputting that same "Slots available: None" message.
Summary
A MapNode gets stuck trying to submit a job to the cluster.
Actual behavior
I have a data loading function that passes lists into a MapNode. Each subnode that the MapNode creates runs successfully, but then the worflow never progresses to the next MapNode. Here's the last section of the debug log:
211215-15:45:34,207 nipype.workflow DEBUG: [Node] No hashfiles found in "/autofs/vast/citadel/studies/hcpa/users/john/analyses/12_14_2021_HCAP_WM_segmentations/outputs/nipype_base_dir/ATT_WM_segstats/vol2vol". 211215-15:45:34,213 nipype.workflow DEBUG: Checking hash "ATT_WM_segstats.vol2vol" locally: cached=False, updated=False. 211215-15:45:34,510 nipype.workflow DEBUG: Ran command (sbatch --account bandlab --partition basic --mem 4GB --time 48:00:00 -o /autofs/vast/citadel/studies/hcpa/users/john/analyses/12_14_2021_HCAP_WM_segmentations/outputs/nipype_base_dir/ATT_WM_segstats/batch/slurm-%j.out -e /autofs/vast/citadel/studies/hcpa/users/john/analyses/12_14_2021_HCAP_WM_segmentations/outputs/nipype_base_dir/ATT_WM_segstats/batch/slurm-%j.out -J vol2vol.ATT_WM_segstats.jj1006 /autofs/vast/citadel/studies/hcpa/users/john/analyses/12_14_2021_HCAP_WM_segmentations/outputs/nipype_base_dir/ATT_WM_segstats/batch/batchscript_pyscript_20211215_154534_ATT_WM_segstats_vol2vol.sh) 211215-15:45:34,517 nipype.workflow DEBUG: submitted sbatch task: 752954 for node vol2vol 211215-15:45:34,523 nipype.workflow INFO: Finished submitting: ATT_WM_segstats.vol2vol ID: 0 211215-15:45:34,530 nipype.workflow DEBUG: Slots available: None 211215-15:45:34,539 nipype.workflow DEBUG: Progress: 681 jobs, 678/1/0 (done/running/ready), 1/2 (pendingtasks/waiting). 211215-15:45:34,591 nipype.interface DEBUG: args-j 752954 211215-15:45:34,725 nipype.workflow DEBUG: Tasks currently running: 1. Pending: 1. 211215-15:45:34,738 nipype.workflow DEBUG: Slots available: None 211215-15:45:36,585 nipype.interface DEBUG: args-j 752954 211215-15:45:36,725 nipype.workflow DEBUG: Slots available: None 211215-15:45:38,588 nipype.interface DEBUG: args-j 752954 211215-15:45:38,726 nipype.workflow DEBUG: Slots available: None 211215-15:45:40,594 nipype.interface DEBUG: args_-j 752954 211215-15:45:40,733 nipype.workflow DEBUG: Slots available: None
And it just continues like that indefinitely.
Expected behavior
For the whole workflow to run all the way through.
How to replicate the behavior
Not exactly sure what's causing it. Using any MapNode with the SLURM plugin seems to do it. I pasted my entire script down below.
Script/Workflow details
Here's the whole code:
Platform details:
Execution environment
Choose one