nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
119 stars 57 forks source link

Error raised in `_submit_job` after jobs are submitted #630

Open arokem opened 1 year ago

arokem commented 1 year ago

Hello! I have been running into a curious error using pydra version 0.22 installed into a conda environment on the UW HPC ("Hyak").

A minimal example that demonstrates this is the following script:

import os.path as op
from tempfile import mkdtemp

import pydra

scratch_dir = "/gscratch/escience/arokem/"
scratch_dir_tmp = op.join(scratch_dir, "tmp_")
cache_dir_tmp = mkdtemp(prefix=scratch_dir_tmp)

@pydra.mark.task
def task(subject):
    print("I am doing something really simple")

subject_list = ["01", "02"]
t = task(subject=subject_list, cache_dir=cache_dir_tmp).split("subject")

with pydra.Submitter(plugin="slurm",
                     sbatch_args=f"-J task -p gpu-a40 -A escience --mem=58G --time=10:00:00 -o /gscratch/escience/arokem/logs/task.out -e /gscratch/escience/arokem/logs/task.err --mail-user=arokem@uw.edu --mail-type=ALL") as sub:
    sub(runnable=t)

When run it raises the following error:

Traceback (most recent call last):
  File "simple.py", line 23, in <module>
    sub(runnable=t)
  File "/gscratch/escience/arokem/afq/lib/python3.8/site-packages/pydra/engine/submitter.py", line 42, in __call__
    self.loop.run_until_complete(self.submit_from_call(runnable, rerun))
  File "/gscratch/escience/arokem/afq/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/gscratch/escience/arokem/afq/lib/python3.8/site-packages/pydra/engine/submitter.py", line 80, in submit_from_call
    await self.expand_runnable(runnable, wait=True, rerun=rerun)
  File "/gscratch/escience/arokem/afq/lib/python3.8/site-packages/pydra/engine/submitter.py", line 128, in expand_runnable
    await asyncio.gather(*futures)
  File "/gscratch/escience/arokem/afq/lib/python3.8/site-packages/pydra/engine/workers.py", line 299, in _submit_job
    self.error[jobid] = error_file.replace("%j", jobid)
AttributeError: 'NoneType' object has no attribute 'replace'

However, interestingly, the jobs are submitted and are executed with no further issues, so this is not actually causing any "real" problems for us (except for looking like an error might have happened).