pBFSLab / DeepPrep

DeepPrep: An accelerated, scalable, and robust pipeline for neuroimaging preprocessing empowered by deep learning
https://deepprep.readthedocs.io
Apache License 2.0
18 stars 6 forks source link

Process `anat_wf:anat_sphere` terminated for an unknown reason -- Likely it has been terminated by the external system #132

Closed JChRoy closed 1 month ago

JChRoy commented 1 month ago

Dear pBFSLab team,

Thanks a lot for this new pipeline. I am trying to run it from the singularity image on our HPC, which works well with allocation of resources for each process till the anat_sphare (mris_sphere) with the following error message:

ERROR ~ Error executing process > 'anat_wf:anat_sphere (sub-1001)' Caused by: Process anat_wf:anat_sphere (sub-1001) terminated for an unknown reason -- Likely it has been terminated by the external system On the sacct report, there is no sign of excessive memory issue.

I also tried different combinations of parameters on the sbatch shell and on a single node, adjusting the mem-per-cpu or mem-per-gpu (on a gpu partition), with the same error.

On the other hand, executing directly mris_sphere from the singularity image provides the expected outputs. Do you have any idea how to fix this issue?

Best, Jean-Charles

The script I launch from the login node:

sbatch deepprep_gpu.sh


#!/bin/bash
#SBATCH --job-name=deepprep_job
#SBATCH --partition=gpu
#SBATCH --output=deepprep_job_%j.out
#SBATCH --error=deepprep_job_%j.err
#SBATCH --time=00:30:00
#SBATCH --cpus-per-task=30 
#SBATCH --mem=40G

export TEST_DIR=/data/gpfs-1/groups/ag_schumann/work/jero14_c
export BIDS_DIR=/data/gpfs-1/groups/ag_schumann/work/jero14_c/test
export output_dir=/data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives
export FS_LICENSE=/data/gpfs-1/groups/ag_schumann/work/jero14_c/scripts/license.txt

${TEST_DIR}/DeepPrep/deepprep/deepprep.sh \
${BIDS_DIR} \
${output_dir} \
participant \
--skip_bids_validation \
--anat_only \
--subjects_dir ${output_dir}/Recon \
--deepprep_home ${TEST_DIR}/DeepPrep \
--fs_license_file ${FS_LICENSE} \
--executor cluster \
--container ${TEST_DIR}/DeepPrep/deepprep_23.1.0.sif \
--config_file ${TEST_DIR}/DeepPrep/deepprep.slurm.gpu.config \
--debug \
--resume

The config_file: deepprep.slurm.gpu.config


//deepprep.slurm.gpu.config

enabled = true
singularity.enabled = true
singularity.autoMounts = true
singularity.runOptions = '-e \
--home ${output_dir}/WorkDir/home \
--env TEMP=${output_dir}/WorkDir/tmp \
--env TMP=${output_dir}/WorkDir/tmp  \
--env TMPDIR=${output_dir}/WorkDir/tmp \
-B ${BIDS_DIR} \
-B ${output_dir} \
-B ${FS_LICENSE}:/opt/freesurfer/license.txt \
'

process {
//errorStrategy = 'ignore'

    executor = 'slurm'

    queue = 'standard'

    clusterOptions = { " --chdir=${nextflow_work_dir}" }

    container = "${container}"

     withLabel: with_gpu {
         queue = 'gpu'
         clusterOptions = { "--gres=gpu:tesla:4 --time=00:30:00" }
         singularity.runOptions = '-e --nv -B ${FS_LICENSE}:/opt/freesurfer/license.txt'
     }
}

Here is the .nextflow.log output

May-29 17:22:54.695 [main] DEBUG nextflow.cli.Launcher - $> nextflow run /data/gpfs-1/groups/ag_schumann/work/jero14_c/DeepPrep/deepprep/nextflow/deepprep.nf -c /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow/$
May-29 17:22:54.870 [main] DEBUG nextflow.cli.CmdRun - N E X T F L O W  ~  version 24.04.1
May-29 17:22:54.911 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/data/gpfs-1/users/jero14_c/.nextflow/plugins; core-plugins: nf-amazon@2.5.2,nf-azure@1.6.0,nf-cloudcach$
May-29 17:22:54.923 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
May-29 17:22:54.924 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
May-29 17:22:54.927 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.10.0 in 'deployment' mode
May-29 17:22:54.941 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
May-29 17:22:54.967 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow/run.config
May-29 17:22:54.969 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow/run.config
May-29 17:22:54.997 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /data/gpfs-1/users/jero14_c/.nextflow/secrets/store.json
May-29 17:22:55.000 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@41477a6d] - activable => nextflow.secret.LocalSecretsProvider@41477a6d
May-29 17:22:55.014 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
May-29 17:22:55.619 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 by global default
May-29 17:22:55.644 [main] DEBUG nextflow.cli.CmdRun - Launching `/data/gpfs-1/groups/ag_schumann/work/jero14_c/DeepPrep/deepprep/nextflow/deepprep.nf` [big_leavitt] DSL2 - revision: 808d7df25a
May-29 17:22:55.645 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
May-29 17:22:55.646 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
May-29 17:22:55.708 [main] DEBUG nextflow.Session - Session UUID: 7e4d9370-5856-4451-b4c2-e44f51869db6
May-29 17:22:55.709 [main] DEBUG nextflow.Session - Run name: big_leavitt
May-29 17:22:55.709 [main] DEBUG nextflow.Session - Executor pool size: 30
May-29 17:22:55.719 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
May-29 17:22:55.747 [main] DEBUG nextflow.cli.CmdRun -
  Version: 24.04.1 build 5913
  Created: 20-05-2024 09:48 UTC (11:48 CEST)
  System: Linux 4.18.0-348.2.1.el8_5.x86_64
  Runtime: Groovy 4.0.21 on OpenJDK 64-Bit Server VM 21.0.3+9-LTS
  Encoding: UTF-8 (UTF-8)
  Process: 629777@hpc-gpu-1 [172.16.58.101]
  CPUs: 30 - Mem: 20 GB (19.8 GB) - Swap: 0 (0) - Virtual threads ON
May-29 17:22:55.793 [main] DEBUG nextflow.Session - Work-dir: /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow [gpfs]
May-29 17:22:55.841 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
May-29 17:22:55.854 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-29 17:22:55.920 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
May-29 17:22:55.932 [main] DEBUG nextflow.util.CustomPoolFactory - Creating virtual thread pool
May-29 17:22:56.177 [main] DEBUG nextflow.Session - Session start
May-29 17:22:58.041 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-29 17:22:58.145 [main] INFO  nextflow.script.BaseScript - DEBUG: params           : [profile:standard, bids_dir:/data/gpfs-1/groups/ag_schumann/work/jero14_c/test, subjects_dir:/data/gpfs-1/groups/ag_schumann/work/jero14_c/derivativ$
May-29 17:22:58.166 [main] INFO  nextflow.script.BaseScript - INFO: bids_dir           : /data/gpfs-1/groups/ag_schumann/work/jero14_c/test
May-29 17:22:58.167 [main] INFO  nextflow.script.BaseScript - INFO: output_dir         : /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives
May-29 17:22:58.168 [main] INFO  nextflow.script.BaseScript - INFO: subjects_dir       : /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/Recon
May-29 17:22:58.237 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.237 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.249 [main] DEBUG nextflow.executor.Executor - [warm up] executor > slurm
May-29 17:22:58.257 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'slurm' > capacity: 100; pollInterval: 5s; dumpInterval: 5m
May-29 17:22:58.260 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: slurm)
May-29 17:22:58.262 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
May-29 17:22:58.372 [main] INFO  nextflow.script.BaseScript - INFO: anat preprocess ONLY
May-29 17:22:58.388 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.389 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.411 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.411 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.413 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [1]
May-29 17:22:58.435 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.435 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.445 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:with_gpu` matches labels `with_gpu` for process with name anat_wf:anat_segment
May-29 17:22:58.447 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.447 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.457 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.458 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.485 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.486 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.500 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.500 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.510 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.511 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.522 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.522 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.537 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.537 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.557 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.557 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.574 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.575 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.582 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.582 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.598 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.598 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.600 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [0]
May-29 17:22:58.607 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.607 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.609 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [0]
May-29 17:22:58.616 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.616 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.617 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [0]
May-29 17:22:58.624 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.624 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.625 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [0]
May-29 17:22:58.632 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.632 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.633 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [0]
May-29 17:22:58.640 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.640 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.641 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [0]
May-29 17:22:58.648 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.648 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.649 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [0]
May-29 17:22:58.661 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:with_gpu` matches labels `with_gpu` for process with name anat_wf:anat_fastcsr_levelset
May-29 17:22:58.663 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.663 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.665 [main] DEBUG nextflow.processor.TaskProcessor - Creating *combiner* operator for each param(s) at index(es): [4]
May-29 17:22:58.687 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.687 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.701 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.701 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.728 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.728 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.743 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.743 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.751 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.752 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.773 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.774 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.785 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.785 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.798 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.806 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.806 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.819 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:with_gpu` matches labels `with_gpu` for process with name anat_wf:anat_sphere_register
May-29 17:22:58.820 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.820 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.832 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.832 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.847 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.847 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.882 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.882 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.903 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.903 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.936 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.936 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.961 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.961 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:58.994 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:58.994 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:59.029 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:59.030 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:59.052 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:59.053 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:59.077 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:59.077 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:59.107 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:59.107 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:59.116 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: slurm
May-29 17:22:59.117 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'slurm'
May-29 17:22:59.150 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: bold_T1_to_2mm, bold_bbregister, split_hemi_aseg_presurf_mgz, anat_cortex_label, anat_fill, bold_bbregister_to_native, anat_wf:anat_hyporelabel, anat_wf:$
May-29 17:22:59.150 [main] DEBUG nextflow.Session - Igniting dataflow network (54)
May-29 17:22:59.151 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > deepprep_init
May-29 17:22:59.151 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_get_t1w_file_in_bids
May-29 17:22:59.151 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_create_subject_orig_dir
May-29 17:22:59.152 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_create_subject_orig_dir
May-29 17:22:59.152 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_motioncor
May-29 17:22:59.152 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_segment
May-29 17:22:59.152 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_reduce_to_aseg
May-29 17:22:59.152 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_N4_bias_correct
May-29 17:22:59.153 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_talairach_and_nu
May-29 17:22:59.153 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_T1
May-29 17:22:59.153 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_brainmask
May-29 17:22:59.154 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_paint_cc_to_aseg
May-29 17:22:59.154 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_fill
May-29 17:22:59.155 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:subject_id_hemi_lh
May-29 17:22:59.156 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:subject_id_hemi_rh
May-29 17:22:59.156 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_orig_mgz
May-29 17:22:59.156 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_orig_mgz
May-29 17:22:59.157 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_rawavg_mgz
May-29 17:22:59.157 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_rawavg_mgz
May-29 17:22:59.158 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_brainmask_mgz
May-29 17:22:59.158 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_brainmask_mgz
May-29 17:22:59.159 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_aseg_presurf_mgz
May-29 17:22:59.159 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_aseg_presurf_mgz
May-29 17:22:59.159 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_brain_finalsurfs_mgz
May-29 17:22:59.159 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_brain_finalsurfs_mgz
May-29 17:22:59.159 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_wm_mgz
May-29 17:22:59.159 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_wm_mgz
May-29 17:22:59.160 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_filled_mgz
May-29 17:22:59.160 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:split_hemi_filled_mgz
May-29 17:22:59.160 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_fastcsr_levelset
May-29 17:22:59.160 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_fastcsr_levelset
May-29 17:22:59.160 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_fastcsr_mksurface
May-29 17:22:59.160 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_autodet_gwstats
May-29 17:22:59.161 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_white_surface
May-29 17:22:59.161 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_cortex_label
May-29 17:22:59.161 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_cortex_hipamyg_label
May-29 17:22:59.161 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_hyporelabel
May-29 17:22:59.161 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_smooth_inflated
May-29 17:22:59.161 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_curv_stats
May-29 17:22:59.162 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_sphere
May-29 17:22:59.162 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_sphere_register
May-29 17:22:59.162 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_jacobian
May-29 17:22:59.162 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_cortparc_aparc
May-29 17:22:59.162 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_pial_surface
May-29 17:22:59.162 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_pctsurfcon
May-29 17:22:59.162 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_parcstats
May-29 17:22:59.163 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_ribbon
May-29 17:22:59.163 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_apas2aseg
May-29 17:22:59.163 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:anat_aparc2aseg
May-29 17:22:59.163 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:qc_plot_volsurf
May-29 17:22:59.163 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:qc_plot_surfparc
May-29 17:22:59.163 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:qc_plot_aparc_aseg
May-29 17:22:59.164 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > anat_wf:qc_anat_create_report
May-29 17:22:59.164 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_7029c4db07bafbd9: /data/gpfs-1/groups/ag_schumann/work/jero14_c/DeepPrep/deepprep/nextflow/deepprep.nf
May-29 17:22:59.164 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
May-29 17:22:59.164 [main] DEBUG nextflow.Session - Session await
May-29 17:22:59.447 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process deepprep_init > jobId: 6803798; workDir: /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow/7d/c329b43b7c16$
May-29 17:22:59.447 [Task submitter] INFO  nextflow.Session - [7d/c329b4] Submitted process > deepprep_init
May-29 17:23:18.313 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 6803798; id: 1; name: deepprep_init; status: COMPLETED; exit: 0; error: -; workDir: /data/gpfs-1/groups/ag_schumann/work/jero1$
May-29 17:23:18.396 [Actor Thread 320] INFO  nextflow.processor.TaskProcessor - [db/be4c4d] Cached process > anat_wf:anat_get_t1w_file_in_bids
May-29 17:23:18.424 [Actor Thread 329] INFO  nextflow.processor.TaskProcessor - [08/da44fe] Cached process > anat_wf:anat_create_subject_orig_dir (sub-1001)
May-29 17:23:18.434 [Actor Thread 335] INFO  nextflow.processor.TaskProcessor - [c2/0a03aa] Cached process > anat_wf:anat_motioncor (sub-1001)
May-29 17:23:18.445 [Actor Thread 355] INFO  nextflow.processor.TaskProcessor - [e7/b29d15] Cached process > anat_wf:split_hemi_rawavg_mgz (sub-1001)
May-29 17:23:18.445 [Actor Thread 353] INFO  nextflow.processor.TaskProcessor - [a7/0555cd] Cached process > anat_wf:split_hemi_orig_mgz (sub-1001)
May-29 17:23:18.445 [Actor Thread 360] INFO  nextflow.processor.TaskProcessor - [ba/6db93e] Cached process > anat_wf:split_hemi_orig_mgz (sub-1001)
May-29 17:23:18.445 [Actor Thread 361] INFO  nextflow.processor.TaskProcessor - [93/7237b6] Cached process > anat_wf:split_hemi_rawavg_mgz (sub-1001)
May-29 17:23:18.448 [Actor Thread 347] INFO  nextflow.processor.TaskProcessor - [09/b3f0c0] Cached process > anat_wf:anat_segment (sub-1001)
May-29 17:23:18.457 [Actor Thread 383] INFO  nextflow.processor.TaskProcessor - [52/c194fb] Cached process > anat_wf:anat_reduce_to_aseg (sub-1001)
May-29 17:23:18.480 [Actor Thread 397] INFO  nextflow.processor.TaskProcessor - [9f/3f36f6] Cached process > anat_wf:anat_N4_bias_correct (sub-1001)
May-29 17:23:18.492 [Actor Thread 405] INFO  nextflow.processor.TaskProcessor - [7b/7286e7] Cached process > anat_wf:anat_talairach_and_nu (sub-1001)
May-29 17:23:18.500 [Actor Thread 413] INFO  nextflow.processor.TaskProcessor - [fd/102e61] Cached process > anat_wf:anat_T1 (sub-1001)
May-29 17:23:18.500 [Actor Thread 416] INFO  nextflow.processor.TaskProcessor - [98/abf55b] Cached process > anat_wf:anat_brainmask (sub-1001)
May-29 17:23:18.509 [Actor Thread 447] INFO  nextflow.processor.TaskProcessor - [b3/76baad] Cached process > anat_wf:split_hemi_brainmask_mgz (sub-1001)
May-29 17:23:18.509 [Actor Thread 440] INFO  nextflow.processor.TaskProcessor - [71/f38489] Cached process > anat_wf:split_hemi_brainmask_mgz (sub-1001)
May-29 17:23:18.509 [Actor Thread 444] INFO  nextflow.processor.TaskProcessor - [4c/0518f7] Cached process > anat_wf:anat_paint_cc_to_aseg (sub-1001)
May-29 17:23:18.520 [Actor Thread 462] INFO  nextflow.processor.TaskProcessor - [1c/30eebd] Cached process > anat_wf:anat_fill (sub-1001)
May-29 17:23:18.548 [Actor Thread 479] INFO  nextflow.processor.TaskProcessor - [d0/954093] Cached process > anat_wf:split_hemi_aseg_presurf_mgz (sub-1001)
May-29 17:23:18.548 [Actor Thread 490] INFO  nextflow.processor.TaskProcessor - [33/b3d52e] Cached process > anat_wf:split_hemi_brain_finalsurfs_mgz (sub-1001)
May-29 17:23:18.548 [Actor Thread 476] INFO  nextflow.processor.TaskProcessor - [11/a5c0d4] Cached process > anat_wf:subject_id_hemi_lh (sub-1001)
May-29 17:23:18.548 [Actor Thread 488] INFO  nextflow.processor.TaskProcessor - [25/8fd328] Cached process > anat_wf:split_hemi_aseg_presurf_mgz (sub-1001)
May-29 17:23:18.548 [Actor Thread 489] INFO  nextflow.processor.TaskProcessor - [92/090f06] Cached process > anat_wf:split_hemi_wm_mgz (sub-1001)
May-29 17:23:18.548 [Actor Thread 474] INFO  nextflow.processor.TaskProcessor - [c6/161e4f] Cached process > anat_wf:subject_id_hemi_rh (sub-1001)
May-29 17:23:18.548 [Actor Thread 496] INFO  nextflow.processor.TaskProcessor - [ae/307c38] Cached process > anat_wf:split_hemi_wm_mgz (sub-1001)
May-29 17:23:18.550 [Actor Thread 497] INFO  nextflow.processor.TaskProcessor - [90/b14bca] Cached process > anat_wf:split_hemi_brain_finalsurfs_mgz (sub-1001)
May-29 17:23:18.551 [Actor Thread 507] INFO  nextflow.processor.TaskProcessor - [f2/47ea44] Cached process > anat_wf:split_hemi_filled_mgz (sub-1001)
May-29 17:23:18.551 [Actor Thread 499] INFO  nextflow.processor.TaskProcessor - [7e/a4d3d0] Cached process > anat_wf:split_hemi_filled_mgz (sub-1001)
May-29 17:23:18.554 [Actor Thread 504] INFO  nextflow.processor.TaskProcessor - [be/fb72e0] Cached process > anat_wf:anat_fastcsr_levelset (sub-1001)
May-29 17:23:18.554 [Actor Thread 512] INFO  nextflow.processor.TaskProcessor - [1b/cea6bf] Cached process > anat_wf:anat_fastcsr_levelset (sub-1001)
May-29 17:23:18.568 [Actor Thread 626] INFO  nextflow.processor.TaskProcessor - [0e/954b3d] Cached process > anat_wf:anat_fastcsr_mksurface (sub-1001)
May-29 17:23:18.568 [Actor Thread 620] INFO  nextflow.processor.TaskProcessor - [d7/1add9e] Cached process > anat_wf:anat_fastcsr_mksurface (sub-1001)
May-29 17:23:18.577 [Actor Thread 650] INFO  nextflow.processor.TaskProcessor - [bb/5cc7ae] Cached process > anat_wf:anat_autodet_gwstats (sub-1001)
May-29 17:23:18.577 [Actor Thread 654] INFO  nextflow.processor.TaskProcessor - [5b/5d2dbc] Cached process > anat_wf:anat_autodet_gwstats (sub-1001)
May-29 17:23:18.593 [Actor Thread 678] INFO  nextflow.processor.TaskProcessor - [29/db45ac] Cached process > anat_wf:anat_white_surface (sub-1001)
May-29 17:23:18.593 [Actor Thread 675] INFO  nextflow.processor.TaskProcessor - [e9/5d1043] Cached process > anat_wf:anat_white_surface (sub-1001)
May-29 17:23:18.604 [Actor Thread 696] INFO  nextflow.processor.TaskProcessor - [88/67f7b2] Cached process > anat_wf:anat_cortex_hipamyg_label (sub-1001)
May-29 17:23:18.605 [Actor Thread 698] INFO  nextflow.processor.TaskProcessor - [f6/f9f1a2] Cached process > anat_wf:anat_cortex_label (sub-1001)
May-29 17:23:18.605 [Actor Thread 723] INFO  nextflow.processor.TaskProcessor - [00/7ccdee] Cached process > anat_wf:anat_smooth_inflated (sub-1001)
May-29 17:23:18.605 [Actor Thread 687] INFO  nextflow.processor.TaskProcessor - [93/809c34] Cached process > anat_wf:anat_smooth_inflated (sub-1001)
May-29 17:23:18.605 [Actor Thread 731] INFO  nextflow.processor.TaskProcessor - [ea/609c43] Cached process > anat_wf:anat_cortex_label (sub-1001)
May-29 17:23:18.607 [Actor Thread 733] INFO  nextflow.processor.TaskProcessor - [d8/dd61bb] Cached process > anat_wf:anat_cortex_hipamyg_label (sub-1001)
May-29 17:23:18.608 [Actor Thread 748] INFO  nextflow.processor.TaskProcessor - [16/61fb9b] Cached process > anat_wf:anat_hyporelabel (sub-1001)
May-29 17:23:18.619 [Actor Thread 853] INFO  nextflow.processor.TaskProcessor - [c3/55e488] Cached process > anat_wf:anat_curv_stats (sub-1001)
May-29 17:23:18.619 [Actor Thread 826] INFO  nextflow.processor.TaskProcessor - [86/e82a88] Cached process > anat_wf:anat_sphere (sub-1001)
May-29 17:23:18.619 [Actor Thread 827] INFO  nextflow.processor.TaskProcessor - [da/4f89d5] Cached process > anat_wf:anat_curv_stats (sub-1001)
May-29 17:23:18.628 [Actor Thread 899] INFO  nextflow.processor.TaskProcessor - [de/61da64] Cached process > anat_wf:anat_sphere_register (sub-1001)
May-29 17:23:18.637 [Actor Thread 909] INFO  nextflow.processor.TaskProcessor - [dd/f6a5d2] Cached process > anat_wf:anat_jacobian (sub-1001)
May-29 17:23:18.638 [Actor Thread 912] INFO  nextflow.processor.TaskProcessor - [34/9c3b03] Cached process > anat_wf:anat_cortparc_aparc (sub-1001)
May-29 17:23:18.655 [Actor Thread 933] INFO  nextflow.processor.TaskProcessor - [17/cb9972] Cached process > anat_wf:anat_pial_surface (sub-1001)
May-29 17:23:18.681 [Actor Thread 988] INFO  nextflow.processor.TaskProcessor - [54/5b4ee2] Cached process > anat_wf:anat_pctsurfcon (sub-1001)
May-29 17:23:18.692 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process anat_wf:anat_sphere (sub-1001) > jobId: 6803803; workDir: /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflo$
May-29 17:23:18.692 [Task submitter] INFO  nextflow.Session - [a1/df9023] Submitted process > anat_wf:anat_sphere (sub-1001)
May-29 17:26:03.270 [Task monitor] DEBUG nextflow.file.FileHelper - NFS path (false): /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow
May-29 17:28:03.274 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor slurm > tasks to be completed: 1 -- submitted tasks are shown below
~> TaskHandler[jobId: 6803803; id: 46; name: anat_wf:anat_sphere (sub-1001); status: RUNNING; exit: -; error: -; workDir: /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow/a1/df902374f24871a8d332460f958550 start$
May-29 17:30:28.278 [Task monitor] DEBUG nextflow.executor.GridTaskHandler - Failed to get exit status for process TaskHandler[jobId: 6803803; id: 46; name: anat_wf:anat_sphere (sub-1001); status: RUNNING; exit: -; error: -; workDir: /d$
Current queue status:
>   (empty)

Content of workDir: /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow/a1/df902374f24871a8d332460f958550
null
May-29 17:30:28.279 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 6803803; id: 46; name: anat_wf:anat_sphere (sub-1001); status: COMPLETED; exit: -; error: -; workDir: /data/gpfs-1/groups/ag_s$
**May-29 17:30:28.283 [TaskFinalizer-2] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=anat_wf:anat_sphere (sub-1001); work-dir=/data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow/a1/df902374f24871a8d332460f958550
  error [nextflow.exception.ProcessFailedException]: Process `anat_wf:anat_sphere (sub-1001)` terminated for an unknown reason -- Likely it has been terminated by the external system
May-29 17:30:28.317 [TaskFinalizer-2] ERROR nextflow.processor.TaskProcessor - Error executing process > 'anat_wf:anat_sphere (sub-1001)'**

Caused by:
  Process `anat_wf:anat_sphere (sub-1001)` terminated for an unknown reason -- Likely it has been terminated by the external system

Command executed:

  mris_sphere -threads 8 -seed 1234 /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/Recon/sub-1001/surf/lh.inflated /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/Recon/sub-1001/surf/lh.sphere

Command exit status:
  -

Command output:
  version: 7.2.0
  available threads: 8
  scaling brain by 0.281...
  MRISunfold() max_passes = 1 -------
  tol=5.0e-01, sigma=0.0, host=unkno, nav=1024, nbrs=2, l_area=1.000, l_dist=1.000
  using quadratic fit line minimization
  complete_dist_mat 0
  rms 0
  smooth_averages 0
  remove_neg 0
  ico_order 0
  which_surface 0
  target_radius 0.000000
  nfields 0
  scale 1.000000
  desired_rms_height -1.000000
  momentum 0.900000
  nbhd_size 7
  max_nbrs 8
  niterations 25
  nsurfaces 0
  SURFACES 3
  flags 0 (0)
  use curv 0
  no sulc 0
  no rigid align 0
  mris->nsize 2
  mris->hemisphere 0
  randomSeed 1234

Command error:
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_TMPDIR as environment variable will not be supported in the future, use APPTAINERENV_TMPDIR instead
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_TASK_WORKDIR as environment variable will not be supported in the future, use APPTAINERENV_NXF_TASK_WORKDIR instead
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
  INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (455) bind mounts
  Setting OMP_NUM_THREADS=8
  setting seed for random number genererator to 1234
  reading original vertex positions...
  projecting onto sphere...
  surface projected - minimizing metric distortion...

Work dir:
  /data/gpfs-1/groups/ag_schumann/work/jero14_c/derivatives/WorkDir/nextflow/a1/df902374f24871a8d332460f958550

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
May-29 17:30:28.323 [TaskFinalizer-2] DEBUG nextflow.Session - Session aborted -- Cause: Process `anat_wf:anat_sphere (sub-1001)` terminated for an unknown reason -- Likely it has been terminated by the external system
May-29 17:30:28.335 [main] DEBUG nextflow.Session - Session await > all processes finished
May-29 17:30:28.369 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: slurm) - terminating tasks monitor poll loop
May-29 17:30:28.369 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-29 17:30:28.381 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=1; ignoredCount=0; cachedCount=50; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abort$
May-29 17:30:28.381 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
May-29 17:30:29.439 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
May-29 17:30:29.606 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-29 17:30:29.656 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
Ireneyou33 commented 1 month ago

Hi Jean-Charles,

The unexpected termination of anat_sphere only happens when attempting to run the entire process, likely due to anat_sphere running out of time as you set with_gpu... --time=00:30:00. You can try to allocate a sufficient number of time slots on both CPU and GPU.

Good luck :)

JChRoy commented 1 month ago

Hi Irene,

Actually, the problem was due to inconsistency in the assignment of jobs between the SLURM script and the nextflow configuration file. Nextflow was assigning jobs to a partition which may not have sufficient resources to perform the command, as you suggested. Setting the correct partition to the queue parameter in the configuration file resolves the issue.

Thanks a lot for your answer. Best, Jean-Charles