Closed jbwexler closed 1 year ago
When running locally, I have had to datalad unlock
my FreeSurfer directories to prevent git annex being unhappy. I'm not sure if this is related.
Hi, I'm meeting a potentially similar issue using 23.0.2. I went back to older versions, 22.0.0 seems fine, but the error appears in 22.0.1, so I'm thinking the fsaverage folder remove & update was causing the issue? Also, it happened when I run 3 or 5 subjects in parallel with slurm (from log the node error happened within 3s), but did not happen when I run just one subject.
Maybe it's caused by multiple instances trying to modify the fsaverage folder at the same time? (Weirdly I did not met this issue when running 5 subjects in parallel (node ran within 3s) on a local server with SGE using 23.0.2, probably because local server has much better IO performance?)
nipreps/fmriprep/pull/2847 nipreps/niworkflows/pull/739
230425-22:11:58,449 nipype.workflow DEBUG:
Allocating fmriprep_23_0_wf.fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a ID=0 (0.20GB, 1 threads). Free: 226.02GB, 63 threads.
230425-22:11:58,449 nipype.workflow DEBUG:
[Node] Not cached "/wd/fmriprep_23_0_wf/fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a".
230425-22:11:58,449 nipype.workflow DEBUG:
Checking hash "fmriprep_23_0_wf.fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a" locally: cached=False, updated=False.
230425-22:11:58,449 nipype.workflow DEBUG:
Running node fmriprep_23_0_wf.fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a on master thread
230425-22:11:58,449 nipype.workflow INFO:
[Node] Setting-up "fmriprep_23_0_wf.fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a" in "/wd/fmriprep_23_0_wf/fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a".
230425-22:11:58,450 nipype.workflow DEBUG:
[Node] Not cached "/wd/fmriprep_23_0_wf/fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a".
230425-22:11:58,451 nipype.utils DEBUG:
Removing contents of /wd/fmriprep_23_0_wf/fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a
230425-22:11:58,453 nipype.workflow DEBUG:
[Node] Writing pre-exec report to "/wd/fmriprep_23_0_wf/fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a/_report/report.rst"
230425-22:11:58,487 nipype.workflow INFO:
[Node] Executing "fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a" <niworkflows.interfaces.bids.BIDSFreeSurferDir>
230425-22:12:06,385 nipype.workflow INFO:
[Node] Finished "fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a", elapsed time 7.620329s.
230425-22:12:06,385 nipype.workflow DEBUG:
Saving results file: '/wd/fmriprep_23_0_wf/fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a/result_fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a.pklz'
230425-22:12:06,385 nipype.workflow WARNING:
Storing result file without outputs
230425-22:12:06,401 nipype.workflow WARNING:
[Node] Error on "fmriprep_23_0_wf.fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a" (/wd/fmriprep_23_0_wf/fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a)
230425-22:12:06,402 nipype.workflow DEBUG:
Clearing 0 from queue
230425-22:12:06,403 nipype.utils DEBUG:
Loading pkl: /wd/fmriprep_23_0_wf/fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a/result_fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a.pklz
230425-22:12:06,403 nipype.workflow ERROR:
Node fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a failed to run on host nc30408.narval.calcul.quebec.
230425-22:12:06,404 nipype.workflow ERROR:
Saving crash info to /out/logs/crash-20230425-221206-zqliu-fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a-61e465b5-0e0f-4e49-a8f7-355674cc43b4.txt
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
self.procs[jobid].run(updatehash=updatehash)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
result = self._run_interface(execute=True)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
return self._run_command(execute)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a.
Traceback:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 397, in run
runtime = self._run_interface(runtime)
File "/opt/conda/lib/python3.9/site-packages/niworkflows/interfaces/bids.py", line 969, in _run_interface
shutil.copytree(source, dest, copy_function=shutil.copy)
File "/opt/conda/lib/python3.9/shutil.py", line 566, in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
File "/opt/conda/lib/python3.9/shutil.py", line 522, in _copytree
raise Error(errors)
shutil.Error: [('/opt/freesurfer/subjects/fsaverage/label/lh.PALS_B12_Lobes.annot', '/out/sourcedata/freesurfer/fsaverage/label/lh.PALS_B12_Lobes.annot', "[Errno 2] No such file or directory: '/out/sourcedata/freesurfer/fsaverage/label/lh.PALS_B12_Lobes.annot'"), ('/opt/freesurfer/subjects/fsaverage/mri', '/out/sourcedata/freesurfer/fsaverage/mri', "[Errno 17] File exists: '/out/sourcedata/freesurfer/fsaverage/mri'"), ('/opt/freesurfer/subjects/fsaverage/surf', '/out/sourcedata/freesurfer/fsaverage/surf', "[Errno 17] File exists: '/out/sourcedata/freesurfer/fsaverage/surf'")]
230425-22:12:06,461 nipype.workflow CRITICAL:
fMRIPrep failed: Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
self.procs[jobid].run(updatehash=updatehash)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
result = self._run_interface(execute=True)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
return self._run_command(execute)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node fsdir_run_20230425_212539_24cd804c_9cdf_4609_b269_e975ab353e4a.
Traceback:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 397, in run
runtime = self._run_interface(runtime)
File "/opt/conda/lib/python3.9/site-packages/niworkflows/interfaces/bids.py", line 969, in _run_interface
shutil.copytree(source, dest, copy_function=shutil.copy)
File "/opt/conda/lib/python3.9/shutil.py", line 566, in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
File "/opt/conda/lib/python3.9/shutil.py", line 522, in _copytree
raise Error(errors)
shutil.Error: [('/opt/freesurfer/subjects/fsaverage/label/lh.PALS_B12_Lobes.annot', '/out/sourcedata/freesurfer/fsaverage/label/lh.PALS_B12_Lobes.annot', "[Errno 2] No such file or directory: '/out/sourcedata/freesurfer/fsaverage/label/lh.PALS_B12_Lobes.annot'"), ('/opt/freesurfer/subjects/fsaverage/mri', '/out/sourcedata/freesurfer/fsaverage/mri', "[Errno 17] File exists: '/out/sourcedata/freesurfer/fsaverage/mri'"), ('/opt/freesurfer/subjects/fsaverage/surf', '/out/sourcedata/freesurfer/fsaverage/surf', "[Errno 17] File exists: '/out/sourcedata/freesurfer/fsaverage/surf'")]
Also, it happened when I run 3 or 5 subjects in parallel with slurm (from log the node error happened within 3s), but did not happen when I run just one subject.
This will be the problem. I suppose we can try to add a lockfile to prevent it, but that's really chancy on networked filesystems.
@liuzhenqi77 If you haven't tried this already, a workaround that worked for me is to simply copy in fsaverage prior to running fmriprep.
Thanks for the update! Yes I'm doing something similar.
I'm going to close this issue since we have a reliable work around. I think @effigies plans to eventually provide a fix in fmriprep.
I haven't actually tested these datasets with a previous version so it's just a guess that it's related to the v23.0.0. I checked my scripts and I don't do anything weird anymore with the fsaverage directories (I used to pre-load them prior to running).
ds000030 is giving me the following error. 'fsaverage/label/lh.aparc.a2005s.annot' is indeed not present.
And ds002843 is giving me this error: