master branch failing without `--use_gpu`

jordandekraker commented 3 years ago

I notice the cpu_inference branch is still open, but commits were also merged to the master branch. Not sure if that was the intended behaviour, but for now this this seems to be breaking the master branch. The same command works perfectly fine with the --use_gpu flag added.

Command:

hippunfold  BigBrain/bidsT1  BigBrain/bidsT1_unfolded participant --modality T1w --skip_preproc --skip_coreg 
--profile cc-slurm

Relevant info from slurm log file:

rule run_inference:
    input: work/sub-001/anat/sub-001_hemi-R_space-corobl_desc-cropped_T1w.nii.gz, /home/jdekrake/.cache/hippunfold/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar
    output: work/sub-001/seg_T1w/sub-001_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz
    jobid: 6
    wildcards: subject=001, modality=T1w, hemi=R
    threads: 16
    resources: mem_mb=32000, gpus=0, time=60

[33mJob counts:
    count   jobs
    1   run_inference
    1[0m
'work/sub-001/anat/sub-001_hemi-R_space-corobl_desc-cropped_T1w.nii.gz' -> 'tempimg/temp_0000.nii.gz'
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/debug.json
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/progress.png
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_latest.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/training_log_2021_1_26_13_22_01.txt
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_best.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_latest.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_best.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/plans.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/debug.json
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/training_log_2021_1_26_13_20_12.txt
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/progress.png
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_latest.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_best.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_latest.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_best.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/debug.json
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/progress.png
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_latest.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_best.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_latest.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_best.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/training_log_2021_1_26_13_44_33.txt
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/debug.json
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/progress.png
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_latest.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_best.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_latest.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_best.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/training_log_2021_1_26_13_43_41.txt
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/debug.json
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/training_log_2021_1_26_13_20_12.txt
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/progress.png
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_latest.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_best.model.pkl
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_latest.model
nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_best.model

Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z

If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

nnUNet_raw_data_base is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read nnunet/paths.md for information on how to set this up properly.
nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read nnunet/pathy.md for information on how to set this up.
using model stored in  tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1
This model expects 1 input modalities for each image
Found 1 unique case ids, here are some examples: ['temp']
If they don't look right, make sure to double check your filenames. They must end with _0000.nii.gz etc
number of cases: 1
number of cases that still need to be predicted: 1
emptying cuda cache
loading parameters for folds, None
folds is None so we will automatically look for output folders (not using 'all'!)
found the following folds:  ['tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4']
using the following model files:  ['tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_best.model', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_best.model', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_best.model', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_best.model', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_best.model']
starting preprocessing generator
starting prediction...
preprocessing templbl/temp.nii.gz
using preprocessor GenericPreprocessor
before crop: (1, 128, 256, 128) after crop: (1, 128, 256, 128) spacing: [0.30000001 0.30000001 0.30000001] 

no resampling necessary
no resampling necessary
before: {'spacing': array([0.30000001, 0.30000001, 0.30000001]), 'spacing_transposed': array([0.30000001, 0.30000001, 0.30000001]), 'data.shape (data is transposed)': (1, 128, 256, 128)} 
after:  {'spacing': array([0.30000001, 0.30000001, 0.30000001]), 'data.shape (data is resampled)': (1, 128, 256, 128)} 

(1, 128, 256, 128)
This worker has ended successfully, no errors to report
predicting templbl/temp.nii.gz
Traceback (most recent call last):
  File "/scratch/jdekrake/hippdev/venv/bin/nnUNet_predict", line 8, in <module>
    sys.exit(main())
  File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/nnunet/inference/predict_simple.py", line 221, in main
    step_size=step_size, checkpoint_name=args.chk)
  File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/nnunet/inference/predict.py", line 634, in predict_from_folder
    segmentation_export_kwargs=segmentation_export_kwargs)
  File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/nnunet/inference/predict.py", line 214, in predict_cases
    trainer.load_checkpoint_ram(p, False)
  File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/nnunet/training/network_training/network_trainer.py", line 356, in load_checkpoint_ram
    self.amp_grad_scaler.load_state_dict(checkpoint['amp_grad_scaler'])
AttributeError: 'NoneType' object has no attribute 'load_state_dict'
[32m[Mon Feb  8 17:21:20 2021][0m
[31mError in rule run_inference:[0m
[31m    jobid: 0[0m
[31m    output: work/sub-001/seg_T1w/sub-001_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz[0m
[31m    shell:
        mkdir -p tempmodel tempimg templbl && cp -v work/sub-001/anat/sub-001_hemi-R_space-corobl_desc-cropped_T1w.nii.gz tempimg/temp_0000.nii.gz && tar -xvf /home/jdekrake/.cache/hippunfold/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar -C tempmodel && export RESULTS_FOLDER=tempmodel && export nnUNet_n_proc_DA=16 && nnUNet_predict -i tempimg -o templbl -t Task101_hcp1200_T1w -chk model_best --disable_tta && cp -v templbl/temp.nii.gz work/sub-001/seg_T1w/sub-001_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)[0m
[31m[0m
[31mExiting because a job execution failed. Look above for error message[0m
HIPPUNFOLD_CACHE_DIR not defined, using default location
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
HIPPUNFOLD_CACHE_DIR not defined, using default location
HIPPUNFOLD_CACHE_DIR not defined, using default location

akhanf commented 3 years ago

Ah no not broken, forgot to mention you'll have to uninstall nnunet and reinstall hippunfold since it relies on a different fork of nnunet (already in the setup.py)

On Mon., Feb. 8, 2021, 6:05 p.m. jordandekraker, notifications@github.com wrote:

I notice the cpu_inference branch is still open, but commits were also merged to the master branch. Not sure if that was the intended behaviour, but for now this this seems to be breaking the master branch. The same command works perfectly fine with the --use_gpu flag added.

Command:

hippunfold BigBrain/bidsT1 BigBrain/bidsT1_unfolded participant --modality T1w --skip_preproc --skip_coreg

--profile cc-slurm

Relevant info from slurm log file:

rule run_inference:
input: work/sub-001/anat/sub-001_hemi-R_space-corobl_desc-cropped_T1w.nii.gz, /home/jdekrake/.cache/hippunfold/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar

output: work/sub-001/seg_T1w/sub-001_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz

jobid: 6

wildcards: subject=001, modality=T1w, hemi=R

threads: 16

resources: mem_mb=32000, gpus=0, time=60
�[33mJob counts:

count jobs

1 run_inference

1�[0m

'work/sub-001/anat/sub-001_hemi-R_space-corobl_desc-cropped_T1w.nii.gz' -> 'tempimg/temp_0000.nii.gz'

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/debug.json

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/progress.png

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_latest.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/training_log_2021_1_26_13_22_01.txt

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_best.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_latest.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_best.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/plans.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/debug.json

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/training_log_2021_1_26_13_20_12.txt

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/progress.png

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_latest.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_best.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_latest.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_2/model_best.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/debug.json

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/progress.png

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_latest.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_best.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_latest.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/model_best.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_0/training_log_2021_1_26_13_44_33.txt

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/debug.json

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/progress.png

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_latest.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_best.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_latest.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_best.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/training_log_2021_1_26_13_43_41.txt

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/debug.json

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/training_log_2021_1_26_13_20_12.txt

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/progress.png

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_latest.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_best.model.pkl

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_latest.model

nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_best.model

Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z

If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

nnUNet_raw_data_base is not defined and nnU-Net can only be used on data for which preprocessed files are already present on your system. nnU-Net cannot be used for experiment planning and preprocessing like this. If this is not intended, please read nnunet/paths.md for information on how to set this up properly.

nnUNet_preprocessed is not defined and nnU-Net can not be used for preprocessing or training. If this is not intended, please read nnunet/pathy.md for information on how to set this up.

using model stored in tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1

This model expects 1 input modalities for each image

Found 1 unique case ids, here are some examples: ['temp']

If they don't look right, make sure to double check your filenames. They must end with _0000.nii.gz etc

number of cases: 1

number of cases that still need to be predicted: 1

emptying cuda cache

loading parameters for folds, None

folds is None so we will automatically look for output folders (not using 'all'!)

found the following folds: ['tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2nnUNetPlansv2.1/fold_0', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2nnUNetPlansv2.1/fold_1', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2nnUNetPlansv2.1/fold_2', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2nnUNetPlansv2.1/fold_3', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4']

using the following model files: ['tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2nnUNetPlansv2.1/fold_0/model_best.model', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_1/model_best.model', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2nnUNetPlansv2.1/fold_2/model_best.model', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_3/model_best.model', 'tempmodel/nnUNet/3d_fullres/Task101_hcp1200_T1w/nnUNetTrainerV2__nnUNetPlansv2.1/fold_4/model_best.model']

starting preprocessing generator

starting prediction...

preprocessing templbl/temp.nii.gz

using preprocessor GenericPreprocessor

before crop: (1, 128, 256, 128) after crop: (1, 128, 256, 128) spacing: [0.30000001 0.30000001 0.30000001]

no resampling necessary

no resampling necessary

before: {'spacing': array([0.30000001, 0.30000001, 0.30000001]), 'spacing_transposed': array([0.30000001, 0.30000001, 0.30000001]), 'data.shape (data is transposed)': (1, 128, 256, 128)}

after: {'spacing': array([0.30000001, 0.30000001, 0.30000001]), 'data.shape (data is resampled)': (1, 128, 256, 128)}

(1, 128, 256, 128)

This worker has ended successfully, no errors to report

predicting templbl/temp.nii.gz

Traceback (most recent call last):

File "/scratch/jdekrake/hippdev/venv/bin/nnUNet_predict", line 8, in
sys.exit(main())
File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/nnunet/inference/predict_simple.py", line 221, in main
step_size=step_size, checkpoint_name=args.chk)
File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/nnunet/inference/predict.py", line 634, in predict_from_folder
segmentation_export_kwargs=segmentation_export_kwargs)
File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/nnunet/inference/predict.py", line 214, in predict_cases
trainer.load_checkpoint_ram(p, False)
File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/nnunet/training/network_training/network_trainer.py", line 356, in load_checkpoint_ram
self.amp_grad_scaler.load_state_dict(checkpoint['amp_grad_scaler'])
AttributeError: 'NoneType' object has no attribute 'load_state_dict'

�[32m[Mon Feb 8 17:21:20 2021]�[0m

�[31mError in rule run_inference:�[0m

�[31m jobid: 0�[0m

�[31m output: work/sub-001/seg_T1w/sub-001_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz�[0m

�[31m shell:
    mkdir -p tempmodel tempimg templbl && cp -v work/sub-001/anat/sub-001_hemi-R_space-corobl_desc-cropped_T1w.nii.gz tempimg/temp_0000.nii.gz && tar -xvf /home/jdekrake/.cache/hippunfold/trained_model.3d_fullres.Task101_hcp1200_T1w.nnUNetTrainerV2.model_best.tar -C tempmodel && export RESULTS_FOLDER=tempmodel && export nnUNet_n_proc_DA=16 && nnUNet_predict -i tempimg -o templbl -t Task101_hcp1200_T1w -chk model_best --disable_tta && cp -v templbl/temp.nii.gz work/sub-001/seg_T1w/sub-001_hemi-R_space-corobl_desc-nnunet_dseg.nii.gz

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)�[0m
�[31m�[0m

�[31mExiting because a job execution failed. Look above for error message�[0m

HIPPUNFOLD_CACHE_DIR not defined, using default location

Shutting down, this might take some time.

Exiting because a job execution failed. Look above for error message

HIPPUNFOLD_CACHE_DIR not defined, using default location

HIPPUNFOLD_CACHE_DIR not defined, using default location

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/khanlab/hippunfold/issues/33, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXV2XPJE7JG3343CCCI4JTS6BU4PANCNFSM4XJ4UNJQ .

jordandekraker commented 3 years ago

Working now with a clean venv.

However, I'm now trying to run the pipeline using --modality segT2w with the following error:

hippunfold Manual/Maastricht/bids/ unfolding/Maastricht_manualseg participant --modality segT2w --profile cc-slurm
Building DAG of jobs...
InputFunctionException in line 48 of /scratch/jdekrake/hippdev/hippunfold/hippunfold/workflow/rules/nnunet.smk:
Error:
  ValueError: modality not supported for nnunet!
Wildcards:
  subject=05
  modality=segT2w
  hemi=Lflip
Traceback:
  File "/scratch/jdekrake/hippdev/hippunfold/hippunfold/workflow/rules/nnunet.smk", line 10, in get_nnunet_input

Traceback (most recent call last):
  File "/scratch/jdekrake/hippdev/venv/bin/hippunfold", line 33, in <module>
    sys.exit(load_entry_point('hippunfold', 'console_scripts', 'hippunfold')())
  File "/scratch/jdekrake/hippdev/hippunfold/hippunfold/run.py", line 14, in main
    app.run_snakemake()
  File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/snakebids/app.py", line 209, in run_snakemake
    run(snakemake_cmd)
  File "/scratch/jdekrake/hippdev/venv/lib/python3.7/site-packages/snakebids/app.py", line 41, in run
    raise Exception("Non zero return code: %d"%process.returncode)
Exception: Non zero return code: 1

nnunet does not need to be run since I'm trying to specify a manually labelled hippocampus. Any ideas?

akhanf commented 3 years ago

Yeah I haven't kept the manual seg workflows going through these updates -- but should be easily fixed I think -- will fix it in your #29 PR

jordandekraker commented 3 years ago

Sounds great, thanks. I'll add the Zenodo model files to that PR too, so don't merge yet!

akhanf commented 3 years ago

I noticed another bug in one of your edits on that branch.. I think to keep things clean we should have a different branch for different features, I'll make another branch for the manual seg fix and another issue too (since this is no longer a cpu/gpu issue)..

akhanf commented 3 years ago

Closing this and moving the manual seg discussion to open issue #11

khanlab / hippunfold

master branch failing without `--use_gpu` #33