mist-medical / MIST

MIST: A simple, scalable, and end-to-end framework for 3D medical imaging segmentation.
Apache License 2.0
28 stars 11 forks source link

Crash when validation batches can't distribute evenly across all GPUs #31

Closed Acasteo closed 1 month ago

Acasteo commented 1 month ago

Describe the bug When using a small dataset with multiple GPUs if (# validation data)/2 < # GPUs the pipeline will crash.

To Reproduce Run mist_run_all using all default arguments with a small dataset size, for example 40 images, and 5 or more GPUs.

Expected behavior There would be a check that the batch size is less than the validation dataset size to output a warning/error to reduce the number of GPUs similar to the check that batch size % n_gpus == 0.

Screenshots Screenshot 2024-09-13 at 10 08 22 AM

aecelaya commented 1 month ago

This has been on my to-do list for a little while now.

Since the validation set batch size is always equal to one, it should give you an error if the number of validation examples is less than the number of GPUs.

I updated run.py to raise a value error if the number of validation images is less than the number of GPUs being used.