Submitting multiple SGEGraph jobs with the same script name and work directory (but with different parameters) seems to cause a conflict with batch filenames.
For some flexibility at runtime, I haven't added all of my fMRI runs into a single nipype instance, but instead am submitting a few jobs with separate iterables (maybe not ideal).
I'm observing that on my local cluster, those jobs will sit in the queue and then (possibly due to jobs using multiple CPU slots finishing up) a number of them will be accepted at the same time. Nipype batch filenames seem to be produced with a timestamp with second precision (e.g. 20200303_104010), which don't end up being unique in this scenario, if the two jobs were accepted at the same time. It looks like they then clobber each other's script names.
I've modified my own scripts to set up separate work directories for the different instances, which bypasses the issue, but I wonder if it would be worth increasing the granularity of batch filenames, or adding some unique identifier.
Actual behavior
200303-10:40:04,137 nipype.utils INFO:
No new version available.
200303-10:40:04,608 nipype.workflow INFO:
Generated workflow graph: /cbica/projects/GraphLearning/project/work/mvpa/representation_mvpa/graph.png (graph2use=hierarchical, simple_form=True).
200303-10:40:10,68 nipype.workflow INFO:
Workflow representation_mvpa settings: ['check', 'execution', 'logging', 'monitoring']
Traceback (most recent call last):
File "/cbica/projects/GraphLearning/project/scripts/mvpa/fsl_representation_mvpa.py", line 332, in <module>
'qsub_args': '-l h_vmem=10G,s_vmem=9.5G -j y',
File "/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype/pipeline/engine/workflows.py", line 632, in run
runner.run(execgraph, updatehash=updatehash, config=self.config)
File "/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype/pipeline/plugins/base.py", line 583, in run
create_pyscript(node, updatehash=updatehash, store_exception=False)
File "/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype/pipeline/plugins/tools.py", line 114, in create_pyscript
savepkl(pkl_file, dict(node=node, updatehash=updatehash))
File "/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype/utils/filemanip.py", line 727, in savepkl
os.rename(tmpfile, filename)
FileNotFoundError: [Errno 2] No such file or directory: '/cbica/projects/GraphLearning/project/work/mvpa/representation_mvpa/batch/node_20200303_104010_representation_mvpa_selectfiles.b0.pklz.tmp' -> '/cbica/projects/GraphLearning/project/work/mvpa/representation_mvpa/batch/node_20200303_104010_representation_mvpa_selectfiles.b0.pklz'
Expected behavior
200303-10:40:04,139 nipype.utils INFO:
No new version available.
200303-10:40:05,595 nipype.workflow INFO:
Generated workflow graph: /cbica/projects/GraphLearning/project/work/mvpa/representation_mvpa/graph.png (graph2use=hierarchical, simple_form=True).
200303-10:40:10,49 nipype.workflow INFO:
Workflow representation_mvpa settings: ['check', 'execution', 'logging', 'monitoring']
How to replicate the behavior
Submit multiple jobs simultaneously
Script/Workflow details
Please put URL to code or code here (if not too long).
Platform details:
(analysis) [arikahn@cubic-login2 logs]$ python -c "import nipype; from pprint import pprint; pprint(nipype.get_info())"
200303-11:00:10,768 nipype.utils INFO:
No new version available.
{'commit_hash': 'c5ce0e2c9',
'commit_source': 'installation',
'networkx_version': '2.3',
'nibabel_version': '2.5.1',
'nipype_version': '1.4.2',
'numpy_version': '1.18.1',
'pkg_path': '/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype',
'scipy_version': '1.3.1',
'sys_executable': '/cbica/home/arikahn/.conda/envs/analysis/bin/python',
'sys_platform': 'linux',
'sys_version': '3.7.4 (default, Aug 13 2019, 20:35:49) \n[GCC 7.3.0]',
'traits_version': '5.1.2'}
Summary
Submitting multiple SGEGraph jobs with the same script name and work directory (but with different parameters) seems to cause a conflict with batch filenames.
For some flexibility at runtime, I haven't added all of my fMRI runs into a single nipype instance, but instead am submitting a few jobs with separate iterables (maybe not ideal).
I'm observing that on my local cluster, those jobs will sit in the queue and then (possibly due to jobs using multiple CPU slots finishing up) a number of them will be accepted at the same time. Nipype batch filenames seem to be produced with a timestamp with second precision (e.g.
20200303_104010
), which don't end up being unique in this scenario, if the two jobs were accepted at the same time. It looks like they then clobber each other's script names.I've modified my own scripts to set up separate work directories for the different instances, which bypasses the issue, but I wonder if it would be worth increasing the granularity of batch filenames, or adding some unique identifier.
Actual behavior
Expected behavior
How to replicate the behavior
Submit multiple jobs simultaneously
Script/Workflow details
Please put URL to code or code here (if not too long).
Platform details:
Execution environment
Choose one