hhaentze / MRSegmentator

Apache License 2.0
91 stars 13 forks source link

Can't run mrsegmentator with batchsize!=1 and assigned fold #13

Closed Chuyun-Shen closed 2 months ago

Chuyun-Shen commented 2 months ago

I have a directory path data_path with some nifti file in it, and use mrsegmentator --input "$data_path" --outdir "$output_path" After reading 8 images, and predicting them. it raise an error as follow:

Done with image of shape torch.Size([1, 789, 233, 333]):

Predicting image of shape torch.Size([1, 773, 233, 333]):
perform_everything_on_gpu: True
Traceback (most recent call last):
  File "xxx/mrseg_venv_2/bin/mrsegmentator", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "xxx/mrseg_venv_2/lib/python3.12/site-packages/mrsegmentator/main.py", line 37, in main
    infer(
  File "xxx/mrseg_venv_2/lib/python3.12/site-packages/mrsegmentator/inference.py", line 89, in infer
    segmentations = predictor.predict_from_list_of_npy_arrays(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx/mrseg_venv_2/lib/python3.12/site-packages/nnunetv2/inference/predict_from_raw_data.py", line 329, in predict_from_list_of_npy_arrays
    return self.predict_from_data_iterator(iterator, save_probabilities, num_processes_segmentation_export)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx/mrseg_venv_2/lib/python3.12/site-packages/nnunetv2/inference/predict_from_raw_data.py", line 361, in predict_from_data_iterator
    proceed = not check_workers_alive_and_busy(export_pool, worker_list, r, allowed_num_queued=2)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx/mrseg_venv_2/lib/python3.12/site-packages/nnunetv2/utilities/file_path_utilities.py", line 103, in check_workers_alive_and_busy
    raise RuntimeError('Some background workers are no longer alive')
RuntimeError: Some background workers are no longer alive

And, if I assign fold model with --fold 1, it will stack after predicting. showing: Done with image of shape torch.Size([1, 805, 233, 333]):

hhaentze commented 2 months ago

This might be caused by an out of memory error. Can you try to run:

mrsegmentator --input "$data_path" --outdir "$output_path" --split_level 2

Chuyun-Shen commented 2 months ago

Works for me. However, when I use --split_level 2, it seems to set the batch size to 1. I've checked my memory usage and GPU memory usage, and neither is maxed out. Is there a way to see specific error outputs? Also, do you know of any faster inference methods, as it currently takes too long to process a single image when I run mrsegmentator --input "$data_path" --outdir "$output_path" --split_level 2

hhaentze commented 2 months ago

Yes, split_level and batch_size are mutually exclusive. The purpose of split_level is to reduce memory usage and it only should be used if a batch of 1 is too large, in any case. If your memory is not maxed out you could try to set the split_level to 1, which requires approx. double the memory. Also, if you use Slurm it will help to increase the number of workers, as nnUNet is heavily bottlenecked by CPU-based pre- and post-processing.

You can slightly increase runtime by choosing a single fold instead of the ensemble classification by specifying --fold 0. On my system the runtime benefit is not that large so personally I prefer to use the standard ensemble configuration to get that last bit of accuracy.

(If you are experienced with nnUNet you could also play around with the --nproc and --nproc_export options. That said, the standard configuration worked best on my system)

Chuyun-Shen commented 2 months ago

Thanks for your detailed response.