zran_seek returned error: -1

Irisfee commented 5 years ago

Hello!

I am using the same codes to run 4 subjects with fmriprep, and 3 of them were preprocessed successfully, with one subject stopped at the middle.

the error is:

traceback (most recent call last): File "/usr/local/miniconda/bin/fmriprep", line 11, in sys.exit(main()) File "/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/cli/run.py", line 342, in main fmriprep_wf.run(**plugin_settings) File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/engine/workflows.py", line 595, in run runner.run(execgraph, updatehash=updatehash, config=self.config) File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/plugins/base.py", line 162, in run self._clean_queue(jobid, graph, result=result)) File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/plugins/base.py", line 224, in _clean_queue raise RuntimeError("".join(result['traceback'])) RuntimeError: Traceback (most recent call last): File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/plugins/multiproc.py", line 69, in run_node result['result'] = node.run(updatehash=updatehash) File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 471, in run result = self._run_interface(execute=True) File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 555, in _run_interface return self._run_command(execute) File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 635, in _run_command result = self._interface.run(cwd=outdir) File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/interfaces/base/core.py", line 522, in run runtime = self._run_interface(runtime) File "/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/interfaces/nilearn.py", line 126, in _run_interface new_nii = concat_imgs(self.inputs.in_files, dtype=self.inputs.dtype) File "/usr/local/miniconda/lib/python3.6/site-packages/nilearn/_utils/niimg_conversions.py", line 449, in concat_niimgs niimg = check_niimg(niimg, ensure_ndim=ndim) File "/usr/local/miniconda/lib/python3.6/site-packages/nilearn/_utils/niimg_conversions.py", line 271, in check_niimg niimg = load_niimg(niimg, dtype=dtype) File "/usr/local/miniconda/lib/python3.6/site-packages/nilearn/_utils/niimg.py", line 116, in load_niimg dtype = _get_target_dtype(niimg.get_data().dtype, dtype) File "/usr/local/miniconda/lib/python3.6/site-packages/nibabel/dataobj_images.py", line 202, in get_data data = np.asanyarray(self._dataobj) File "/usr/local/miniconda/lib/python3.6/site-packages/numpy/core/numeric.py", line 544, in asanyarray return array(a, dtype, copy=False, order=order, subok=True) File "/usr/local/miniconda/lib/python3.6/site-packages/nibabel/arrayproxy.py", line 356, in array raw_data = self.get_unscaled() File "/usr/local/miniconda/lib/python3.6/site-packages/nibabel/arrayproxy.py", line 351, in get_unscaled mmap=self._mmap) File "/usr/local/miniconda/lib/python3.6/site-packages/nibabel/volumeutils.py", line 525, in array_from_file infile.seek(offset) File "indexed_gzip/indexed_gzip.pyx", line 430, in indexed_gzip.indexed_gzip._IndexedGzipFile.seek indexed_gzip.indexed_gzip.ZranError: zran_seek returned error: -1

The codes I used for running:

singularity run -e --bind /projects/kuhl_lab/yzhao17:/projects/kuhl_lab/yzhao17 /projects/kuhl_lab/yzhao17/Image/fmriprep.simg /projects/kuhl_lab/yzhao17/DIIN/bids_data /projects/kuhl_lab/yzhao17/DIIN/derivatives participant --participant_label 01 --output-space T1w template fsaverage6 --medial-surface-nan -w /projects/kuhl_lab/yzhao17/DIIN/bids_data/works --resource-monitor --notrack --stop-on-first-crash

The data passed bids format check, and I don't think there is any difference between subjects. I am wondering what's going on.

thank you so much!

effigies commented 5 years ago

Hi, what version of fMRIPrep are you using? This may have been addressed in #1356, which was included in the 1.2.1 release.

Related: #801.

Irisfee commented 5 years ago

Hi, I am using 1.2.1. I have checked out the #1356, but I am still confused how to solve it. Is that possible for me to change that setting?

effigies commented 5 years ago

Ah, then it didn't fix it. That's a shame. Have you tried re-running? Our best guess has been that it's an NFS cache-related issue, and sometimes re-running resolves it.

cc @pauldmccarthy Just a heads up that this appears under default nibabel settings (we no longer set KEEP_FILE_OPEN_DEFAULT='auto').

Irisfee commented 5 years ago

I have rerun it, but however, I got the same error again.

Irisfee commented 5 years ago

I am wondering if I should clean up the work dir when re-running?

effigies commented 5 years ago

That sometimes helps, but I'm not sure that it would, here. Can you determine the node in which it failed? There are several nodes that use that interface; if we can identify which one, we can try deleting just its inputs, to save you some time.

Irisfee commented 5 years ago

sorry I am totally a newbie to the nipype workflow and I don't know how to find the node that failed. here is the output file: df_job001.txt

looking forward to your further suggestions!

emdupre commented 5 years ago

It looks like the error is on the merge node in the bold_t1_trans_wf:

181113-18:49:55,165 nipype.workflow WARNING: [Node] Error on "fmriprep_wf.single_subject_01_wf.func_preproc_task_retrieval_run_02_wf.bold_t1_trans_wf.merge" (/projects/kuhl_lab/yzhao17/DIIN/bids_data/works/fmriprep_wf/single_subject_01_wf/func_preproc_task_retrieval_run_02_wf/bold_t1_trans_wf/merge)

Could you look in that folder in the working directory, @Irisfee, and remove its listed inputs before re-running ?

effigies commented 5 years ago

   [Node] Error on "fmriprep_wf.single_subject_01_wf.func_preproc_task_retrieval_run_02_wf.bold_t1_trans_wf.merge" (/projects/kuhl_lab/yzhao17/DIIN/bids_data/works/fmriprep_wf/single_subject_01_wf/func_preproc_task_retrieval_run_02_wf/bold_t1_trans_wf/merge)

So perhaps just delete /projects/kuhl_lab/yzhao17/DIIN/bids_data/works/fmriprep_wf/single_subject_01_wf/func_preproc_task_retrieval_run_02_wf/bold_t1_trans_wf

Irisfee commented 5 years ago

I have re-run another time and this time I have got the results successfully. thank you very much @effigies and @emdupre

I am just wondering if this bug happens totally random, or it might more likely to happen in some situations? I run 4 subjects together, and I got this problem twice only on one of them.

effigies commented 5 years ago

I believe it's actually an NFS issue. We can't reliably reproduce it. If deleting the inputs resolved the issue, I'd guess that perhaps it's actually the step before merge that's producing malformed files. Next time you see this, could you zip up that entire directory, and we can try to see if there are some malformed files there?

Irisfee commented 5 years ago

Due to some reasons, I rerun all the subjects again, and still, I have the same trouble with that subject. This time, deleting the error node folder doesn't solve the trouble. I can share you with the functional data but I can't share you with the anatomical data without skull-stripping for subject-protecting. Would that help?

effigies commented 5 years ago

No, we can't reproduce without the anatomicals. But if you're able to tar the bold_t1_trans_wf directory, I could at least inspect the intermediate results.

Irisfee commented 5 years ago

I have sent the files to your gmail address. thanks

effigies commented 5 years ago

import nibabel as nb
import glob

for img in glob.glob('bold_to_t1w_transform/*.nii.gz'):
    try:
        nb.load(img).get_data()
    except Exception as err:
        print(img)
        print(err)

bold_to_t1w_transform/vol0109_xform-00109.nii.gz
zran_seek returned error: -1

However, if I uninstall indexed_gzip, I can read it fine. Reinstalling, I get the error again.

@pauldmccarthy Here's a failing file: vol0109_xform-00109.nii.gz

I can open an issue over in indexed_gzip, if you like.

effigies commented 5 years ago

I wonder if we can do any kind of fallback, so that we can use IndexedGzipFile when it works, and use GzipFile when we hit a zran_seek error.

pauldmccarthy commented 5 years ago

@effigies thanks - super busy right now, but I will look at this as soon as I get a chance.

effigies commented 5 years ago

@Irisfee Do you think you could just delete /projects/kuhl_lab/yzhao17/DIIN/bids_data/works/fmriprep_wf/single_subject_01_wf/func_preproc_task_retrieval_run_04_wf/bold_t1_trans_wf, for now? Until we can either get a fix into indexed_gzip or nibabel, I think this is just going to be a stochastic bug.

@pauldmccarthy Sounds good. I opened an issue on your page, so that you don't have to dig around over here to find it.

Irisfee commented 5 years ago

I have deleted that and rerun twice, however, I got the same error twice at this node. So it seems that deleting the workdir doesn't solve the issue now.

hstojic commented 5 years ago

Hello,

I got the same error with one of my datasets, although in a slightly different part of the pipeline:

181121-02:02:30,129 nipype.workflow WARNING:
     [Node] Error on "fmriprep_wf.single_subject_s016_wf.func_preproc_task_fnclearning_run_04_wf.bold_mni_trans_wf.bold_to_mni_transform" (/scratch/scratch/ucjttoj/fnclearning_fmri/dProcessed/fmriprep_work/fmriprep_wf/single_subject_s016_wf/func_preproc_task_fnclearning_run_04_wf/bold_mni_trans_wf/bold_to_mni_transform)
181121-02:02:30,690 nipype.workflow ERROR:
     Node bold_to_mni_transform failed to run on host node-i00a-002.myriad.ucl.ac.uk.
181121-02:02:30,692 nipype.workflow ERROR:
     Saving crash info to /home/ucjttoj/Scratch/fnclearning_fmri/dProcessed/fmriprep/sub-s016/log/20181120-233348_48bec329-0a82-4f31-be7a-de197281ba15/crash-20181121-020230-ucjttoj-bold_to_mni_transform-a49ac52c-7129-4aae-a683-b36f43774280.txt
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/plugins/multiproc.py", line 69, in run_node
    result['result'] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 471, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 555, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/pipeline/engine/nodes.py", line 635, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/interfaces/base/core.py", line 522, in run
    runtime = self._run_interface(runtime)
  File "/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/interfaces/itk.py", line 143, in _run_interface
    for i, (in_file, in_xfm) in enumerate(zip(in_files, xfms_list))]
  File "/usr/local/miniconda/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/local/miniconda/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/miniconda/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/miniconda/lib/python3.6/site-packages/fmriprep/interfaces/itk.py", line 262, in _applytfms
    runtime = xfm.run().runtime
  File "/usr/local/miniconda/lib/python3.6/site-packages/nipype/interfaces/base/core.py", line 522, in run
    runtime = self._run_interface(runtime)
  File "/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/interfaces/fixes.py", line 29, in _run_interface
    self.__class__.__name__, __version__))
  File "/usr/local/miniconda/lib/python3.6/site-packages/niworkflows/interfaces/utils.py", line 179, in _copyxform
    newimg = resampled.__class__(resampled.get_data(), orig.affine, header)
  File "/usr/local/miniconda/lib/python3.6/site-packages/nibabel/dataobj_images.py", line 202, in get_data
    data = np.asanyarray(self._dataobj)
  File "/usr/local/miniconda/lib/python3.6/site-packages/numpy/core/numeric.py", line 544, in asanyarray
    return array(a, dtype, copy=False, order=order, subok=True)
  File "/usr/local/miniconda/lib/python3.6/site-packages/nibabel/arrayproxy.py", line 356, in __array__
    raw_data = self.get_unscaled()
  File "/usr/local/miniconda/lib/python3.6/site-packages/nibabel/arrayproxy.py", line 351, in get_unscaled
    mmap=self._mmap)
  File "/usr/local/miniconda/lib/python3.6/site-packages/nibabel/volumeutils.py", line 525, in array_from_file
    infile.seek(offset)
  File "indexed_gzip/indexed_gzip.pyx", line 430, in indexed_gzip.indexed_gzip._IndexedGzipFile.seek
indexed_gzip.indexed_gzip.ZranError: zran_seek returned error: -1

Here is the crash file: crash-s016.txt

I ran it twice, failing the same way both times - but now I will try deleting the files and see whether it succeeds.

effigies commented 5 years ago

The quick fix will be to remove indexed_gzip from our Docker images. Given that this is popping up so frequently now, I think we'll need to in the short term.

effigies commented 5 years ago

@hstojic If you can find a failing file, could you send it along to @pauldmccarthy in pauldmccarthy/indexed_gzip#15.

pauldmccarthy commented 5 years ago

indexed_gzip 0.8.8 is now available, which should hopefully fix this issue - let me know if you are still having problems after upgrading.

nipreps / fmriprep

zran_seek returned error: -1 #1387