nipreps / nibabies

fMRIPrep-Infants - A robust, transparent workflow tailored for neonate and infant MRI
https://nibabies.readthedocs.io/en/latest
Apache License 2.0
23 stars 10 forks source link

Memory Issues in the Subcortical Realignment #176

Closed audreymhoughton closed 2 years ago

audreymhoughton commented 2 years ago

What happened?

Fail due to timeout / memory issue when adding the --cifti-output 91k flag. We believe this is happening in the subcortical realignment stage. We had a similar issue in the dcan-infant-pipeline.

What command did you use?

env -i ${singularity} run --cleanenv \
-B /home/faird/shared/projects/nibabies_test/BCP2:/data:ro \
-B /home/faird/shared/projects/nibabies_test/BCP_output2:/out \
-B /home/faird/shared/projects/nibabies_test/license.txt:/opt/freesurfer/license.txt \
-B /home/faird/shared/projects/nibabies_test/work2/:/work \
/home/faird/shared/code/external/pipelines/nibabies/nibabies-21.0.2.sif /data /out participant --age-months 12 \
--output-spaces MNIInfant:cohort-4 \
--cifti-output 91k \
-w /work

What version of NiBabies are you using?

21.0.2

Relevant log output

bids-validator@1.8.5
bids-specification@disable
    1: [WARN] The recommended file /README is missing. See Section 03 (Modality agnostic files) of the BIDS specification. (code: 101 - README_FILE_MISSING)

    Please visit https://neurostars.org/search?q=README_FILE_MISSING for existing conversations about this issue.

        Summary:                  Available Tasks:        Available Modalities: 
        14 Files, 886.95MB        rest                    MRI                   
        1 - Subject                                                             
        1 - Session                                                             

    If you have any questions, please post on https://neurostars.org/tags/bids.
220108-00:09:52,135 nipype.workflow IMPORTANT:

    Running nibabies version 21.0.2+0.g7833092.dirty:
      * BIDS dataset path: /data.
      * Participant list: ['CENSORED'].
      * Run identifier: 20220108-000938_08d1d971-e221-4029-a3d9-1d87deffc098.
      * Output spaces: MNIInfant:cohort-4:res-native.
      * Pre-run FreeSurfer's SUBJECTS_DIR: /out/sourcedata/infant-freesurfer.
220108-00:09:55,926 nipype.workflow INFO:
     No single-band-reference found for sub-CENSORED_ses-12mo_task-rest_acq-AP_run-02_bold.nii.gz.
220108-00:09:55,931 nipype.workflow INFO:
     Found usable B0 fieldmap <('auto_00000',)>
220108-00:10:08,918 nipype.workflow IMPORTANT:
     BOLD series will be slice-timing corrected to an offset of 0.311s.
220108-00:10:09,550 nipype.workflow INFO:
     No single-band-reference found for sub-CENSORED_ses-12mo_task-rest_acq-PA_run-01_bold.nii.gz.
220108-00:10:09,554 nipype.workflow INFO:
     Found usable B0 fieldmap <('auto_00000',)>
220108-00:10:22,89 nipype.workflow IMPORTANT:
     BOLD series will be slice-timing corrected to an offset of 0.31s.
220108-00:10:23,481 nipype.workflow INFO:
     Fieldmap estimators found: [<EstimatorType.PEPOLAR: 2>]
220108-00:10:24,57 nipype.workflow INFO:
     Setting-up fieldmap "auto_00000" (EstimatorType.PEPOLAR) with <sub-CENSORED_ses-12mo_dir-AP_run-02_epi.nii.gz, sub-CENSORED_ses-12mo_dir-PA_run-01_epi.nii.gz>
220108-00:10:27,786 nipype.workflow INFO:
     NiBabies workflow graph with 639 nodes built successfully.
220108-00:10:49,449 nipype.workflow IMPORTANT:
     nibabies started!
220108-00:11:28,743 nipype.workflow INFO:
     [Node] Setting-up "_ds_coeff0" in "/work/nibabies_wf/single_subject_CENSORED_wf/fmap_preproc_wf/fmap_derivatives_wf_auto_00000/ds_coeff/mapflow/_ds_coeff0".
220108-00:11:28,793 nipype.workflow INFO:
     [Node] Executing "_ds_coeff0" <sdcflows.workflows.outputs.DerivativesDataSink>
220108-00:11:28,953 nipype.workflow INFO:
     [Node] Finished "_ds_coeff0", elapsed time 0.121936s.
220108-00:11:36,827 nipype.workflow INFO:
     [Node] Setting-up "nibabies_wf.single_subject_CENSORED_wf.fmap_preproc_wf.fmap_derivatives_wf_auto_00000.ds_reference" in "/work/nibabies_wf/single_subject_CENSORED_wf/fmap_preproc_wf/fmap_derivatives_wf_auto_00000/ds_reference".
220108-00:11:37,4 nipype.workflow INFO:
     [Node] Executing "ds_reference" <sdcflows.workflows.outputs.DerivativesDataSink>
220108-00:11:37,160 nipype.workflow INFO:
     [Node] Setting-up "nibabies_wf.single_subject_CENSORED_wf.fmap_preproc_wf.fmap_reports_wf_auto_00000.fmap_rpt" in "/work/nibabies_wf/single_subject_CENSORED_wf/fmap_preproc_wf/fmap_reports_wf_auto_00000/fmap_rpt".
220108-00:11:37,204 nipype.workflow INFO:
     [Node] Executing "fmap_rpt" <sdcflows.interfaces.reportlets.FieldmapReportlet>
220108-00:11:37,310 nipype.workflow INFO:
     [Node] Finished "ds_reference", elapsed time 0.26708s.
220108-00:11:39,2 nipype.workflow INFO:
     [Node] Setting-up "nibabies_wf.single_subject_CENSORED_wf.fmap_preproc_wf.fmap_derivatives_wf_auto_00000.ds_fieldmap" in "/work/nibabies_wf/single_subject_CENSORED_wf/fmap_preproc_wf/fmap_derivatives_wf_auto_00000/ds_fieldmap".
220108-00:11:39,50 nipype.workflow INFO:
     [Node] Executing "ds_fieldmap" <sdcflows.workflows.outputs.DerivativesDataSink>
220108-00:11:39,372 nipype.workflow INFO:
     [Node] Finished "ds_fieldmap", elapsed time 0.285122s.
220108-00:11:49,797 nipype.workflow INFO:
     [Node] Finished "fmap_rpt", elapsed time 12.555629s.
220108-00:11:57,244 nipype.workflow INFO:
     [Node] Setting-up "nibabies_wf.single_subject_CENSORED_wf.fmap_preproc_wf.fmap_reports_wf_auto_00000.ds_fmap_report" in "/work/nibabies_wf/single_subject_CENSORED_wf/fmap_preproc_wf/fmap_reports_wf_auto_00000/ds_fmap_report".
220108-00:11:57,249 nipype.workflow INFO:
     [Node] Outdated cache found for "nibabies_wf.single_subject_CENSORED_wf.fmap_preproc_wf.fmap_reports_wf_auto_00000.ds_fmap_report".
220108-00:11:57,297 nipype.workflow INFO:
     [Node] Executing "ds_fmap_report" <sdcflows.workflows.outputs.DerivativesDataSink>
220108-00:11:57,415 nipype.workflow INFO:
     [Node] Finished "ds_fmap_report", elapsed time 0.078601s.
220108-00:11:58,405 nipype.workflow INFO:
     [Node] Setting-up "nibabies_wf.single_subject_CENSORED_wf.infant_anat_wf.infant_surface_recon_wf.gifti_surface_wf.get_surfaces" in "/work/nibabies_wf/single_subject_CENSORED_wf/infant_anat_wf/infant_surface_recon_wf/gifti_surface_wf/get_surfaces".
220108-00:11:58,455 nipype.workflow INFO:
     [Node] Executing "get_surfaces" <nipype.interfaces.io.FreeSurferSource>
220108-00:11:58,530 nipype.workflow INFO:
     [Node] Finished "get_surfaces", elapsed time 0.03701s.
220108-00:11:59,836 nipype.workflow INFO:
     [Node] Setting-up "nibabies_wf.single_subject_CENSORED_wf.infant_anat_wf.anat_reports_wf.recon_report" in "/work/nibabies_wf/single_subject_CENSORED_wf/infant_anat_wf/anat_reports_wf/recon_report".
220108-00:11:59,920 nipype.workflow INFO:
     [Node] Executing "recon_report" <smriprep.interfaces.reports.FSSurfaceReport>
220108-00:12:13,964 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/anat/sub-CENSORED_ses-12mo_run-1_desc-aseg_dseg.nii.gz dtype from int32 to int16
220108-00:12:15,989 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/anat/sub-CENSORED_ses-12mo_run-1_desc-aparcaseg_dseg.nii.gz dtype from int32 to int16
220108-00:12:18,244 nipype.workflow INFO:
     [Node] Setting-up "_midthickness0" in "/work/nibabies_wf/single_subject_CENSORED_wf/infant_anat_wf/infant_surface_recon_wf/gifti_surface_wf/midthickness/mapflow/_midthickness0".
220108-00:12:18,248 nipype.workflow INFO:
     [Node] Setting-up "_midthickness1" in "/work/nibabies_wf/single_subject_CENSORED_wf/infant_anat_wf/infant_surface_recon_wf/gifti_surface_wf/midthickness/mapflow/_midthickness1".
220108-00:12:18,319 nipype.workflow INFO:
     [Node] Executing "_midthickness0" <niworkflows.interfaces.freesurfer.MakeMidthickness>
220108-00:12:18,324 nipype.workflow INFO:
     [Node] Executing "_midthickness1" <niworkflows.interfaces.freesurfer.MakeMidthickness>
220108-00:12:35,489 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/anat/sub-CENSORED_ses-12mo_run-1_space-MNIInfant_cohort-4_desc-brain_mask.nii.gz dtype from float64 to uint8
220108-00:12:36,505 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/anat/sub-CENSORED_ses-12mo_run-1_space-MNIInfant_cohort-4_dseg.nii.gz dtype from float64 to int16
220108-00:13:04,371 nipype.workflow INFO:
     [Node] Finished "recon_report", elapsed time 64.412049s.
220108-00:13:22,210 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/func/sub-CENSORED_ses-12mo_task-rest_acq-AP_run-2_space-MNIInfant_cohort-4_desc-aseg_dseg.nii.gz dtype from float64 to int16
220108-00:13:22,579 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/func/sub-CENSORED_ses-12mo_task-rest_acq-AP_run-2_space-MNIInfant_cohort-4_desc-aparcaseg_dseg.nii.gz dtype from float64 to int16
220108-00:14:27,248 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/func/sub-CENSORED_ses-12mo_task-rest_acq-AP_run-2_space-MNIInfant_cohort-4_desc-brain_mask.nii.gz dtype from float64 to uint8
220108-00:14:31,164 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/func/sub-CENSORED_ses-12mo_task-rest_acq-PA_run-1_space-MNIInfant_cohort-4_desc-aseg_dseg.nii.gz dtype from float64 to int16
220108-00:14:31,526 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/func/sub-CENSORED_ses-12mo_task-rest_acq-PA_run-1_space-MNIInfant_cohort-4_desc-aparcaseg_dseg.nii.gz dtype from float64 to int16
220108-00:15:34,740 nipype.interface WARNING:
     Changing /out/sub-CENSORED/ses-12mo/func/sub-CENSORED_ses-12mo_task-rest_acq-PA_run-1_space-MNIInfant_cohort-4_desc-brain_mask.nii.gz dtype from float64 to uint8
220108-00:15:45,393 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi2" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi2".
220108-00:15:45,397 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi1" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi1".
220108-00:15:45,397 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi0" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi0".
220108-00:15:45,400 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi3" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi3".
220108-00:15:45,408 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi4" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi4".
220108-00:15:45,409 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi6" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi6".
220108-00:15:45,411 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi7" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi7".
220108-00:15:45,420 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi9" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi9".
220108-00:15:45,420 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi8" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi8".
220108-00:15:45,423 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi10" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi10".
220108-00:15:45,433 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi12" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi12".
220108-00:15:45,434 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi11" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi11".
220108-00:15:45,458 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi2" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:45,469 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi1" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:45,498 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi4" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:45,498 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi0" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:45,502 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi3" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:45,522 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi6" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:45,524 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi7" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:46,69 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi9" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:46,79 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi11" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:46,79 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi12" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:46,79 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi8" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:46,152 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi10" <nipype.interfaces.fsl.preprocess.ApplyXFM>
220108-00:15:47,515 nipype.workflow INFO:
     [Node] Setting-up "_applyxfm_roi13" in "/work/nibabies_wf/single_subject_CENSORED_wf/func_preproc_ses_12mo_task_rest_acq_AP_run_02_wf/subcortical_mni_alignment_wf/applyxfm_roi/mapflow/_applyxfm_roi13".
220108-00:15:47,567 nipype.workflow INFO:
     [Node] Executing "_applyxfm_roi13" <nipype.interfaces.fsl.preprocess.ApplyXFM>

Add any additional information or context about the problem here.

The log above is the .out.

Here is the .err:

slurmstepd: error: *** JOB 9837684 ON cn0195 CANCELLED AT 2022-01-08T00:15:48 DUE TO NODE FAILURE, SEE SLURMCTLD LOG FOR DETAILS ***
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=9837684.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: *** JOB 9837684 ON cn0405 CANCELLED AT 2022-01-11T00:09:34 DUE TO TIME LIMIT ***

Here's the resources requested:

#SBATCH -J nibabies
#SBATCH --ntasks=24
#SBATCH --tmp=10gb
#SBATCH --mem=60gb
#SBATCH -t 72:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=hough129@umn.edu
#SBATCH -p small,amdsmall
#SBATCH -o output_logs/nibabies_full_%A_%a.out
#SBATCH -e output_logs/nibabies_full_%A_%a.err
#SBATCH -A miran045
mgxd commented 2 years ago

I would try reducing the number of simultaneous jobs using the --nproc flag (maybe limiting it to 12 or 6) - since you are requesting 24 tasks (generally 1 core per task), nibabies will allow the running of up to 24 simultaneous jobs. Depending on the resolution of your bold image(s), as well as number of timepoints, this can cause a large spike in the amount of memory actually consumed.

audreymhoughton commented 2 years ago

Trying this now - I will let you know how it goes.

mgxd commented 2 years ago

if this didn't work, feel free to reopen