nipy / nipype

Workflows and interfaces for neuroimaging packages
https://nipype.readthedocs.org/en/latest/
Other
745 stars 529 forks source link

Multiple SGEGraph jobs causing conflicting batch filenames #3181

Open ariekahn opened 4 years ago

ariekahn commented 4 years ago

Summary

Submitting multiple SGEGraph jobs with the same script name and work directory (but with different parameters) seems to cause a conflict with batch filenames.

For some flexibility at runtime, I haven't added all of my fMRI runs into a single nipype instance, but instead am submitting a few jobs with separate iterables (maybe not ideal).

I'm observing that on my local cluster, those jobs will sit in the queue and then (possibly due to jobs using multiple CPU slots finishing up) a number of them will be accepted at the same time. Nipype batch filenames seem to be produced with a timestamp with second precision (e.g. 20200303_104010), which don't end up being unique in this scenario, if the two jobs were accepted at the same time. It looks like they then clobber each other's script names.

I've modified my own scripts to set up separate work directories for the different instances, which bypasses the issue, but I wonder if it would be worth increasing the granularity of batch filenames, or adding some unique identifier.

Actual behavior

200303-10:40:04,137 nipype.utils INFO:
         No new version available.
200303-10:40:04,608 nipype.workflow INFO:
         Generated workflow graph: /cbica/projects/GraphLearning/project/work/mvpa/representation_mvpa/graph.png (graph2use=hierarchical, simple_form=True).
200303-10:40:10,68 nipype.workflow INFO:
         Workflow representation_mvpa settings: ['check', 'execution', 'logging', 'monitoring']
Traceback (most recent call last):
  File "/cbica/projects/GraphLearning/project/scripts/mvpa/fsl_representation_mvpa.py", line 332, in <module>
    'qsub_args': '-l h_vmem=10G,s_vmem=9.5G -j y',
  File "/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype/pipeline/engine/workflows.py", line 632, in run
    runner.run(execgraph, updatehash=updatehash, config=self.config)
  File "/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype/pipeline/plugins/base.py", line 583, in run
    create_pyscript(node, updatehash=updatehash, store_exception=False)
  File "/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype/pipeline/plugins/tools.py", line 114, in create_pyscript
    savepkl(pkl_file, dict(node=node, updatehash=updatehash))
  File "/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype/utils/filemanip.py", line 727, in savepkl
    os.rename(tmpfile, filename)
FileNotFoundError: [Errno 2] No such file or directory: '/cbica/projects/GraphLearning/project/work/mvpa/representation_mvpa/batch/node_20200303_104010_representation_mvpa_selectfiles.b0.pklz.tmp' -> '/cbica/projects/GraphLearning/project/work/mvpa/representation_mvpa/batch/node_20200303_104010_representation_mvpa_selectfiles.b0.pklz'

Expected behavior

200303-10:40:04,139 nipype.utils INFO:
         No new version available.
200303-10:40:05,595 nipype.workflow INFO:
         Generated workflow graph: /cbica/projects/GraphLearning/project/work/mvpa/representation_mvpa/graph.png (graph2use=hierarchical, simple_form=True).
200303-10:40:10,49 nipype.workflow INFO:
         Workflow representation_mvpa settings: ['check', 'execution', 'logging', 'monitoring']

How to replicate the behavior

Submit multiple jobs simultaneously

Script/Workflow details

Please put URL to code or code here (if not too long).

Platform details:

(analysis) [arikahn@cubic-login2 logs]$ python -c "import nipype; from pprint import pprint; pprint(nipype.get_info())"
200303-11:00:10,768 nipype.utils INFO:
     No new version available.
{'commit_hash': 'c5ce0e2c9',
 'commit_source': 'installation',
 'networkx_version': '2.3',
 'nibabel_version': '2.5.1',
 'nipype_version': '1.4.2',
 'numpy_version': '1.18.1',
 'pkg_path': '/cbica/home/arikahn/.conda/envs/analysis/lib/python3.7/site-packages/nipype',
 'scipy_version': '1.3.1',
 'sys_executable': '/cbica/home/arikahn/.conda/envs/analysis/bin/python',
 'sys_platform': 'linux',
 'sys_version': '3.7.4 (default, Aug 13 2019, 20:35:49) \n[GCC 7.3.0]',
 'traits_version': '5.1.2'}

Execution environment

Choose one

ariekahn commented 4 years ago

It looks like crash files use uuid.uuid4() in addition to datetime for naming, would that make any sense here?