GPU usage low for extended periods of time

Hi, I am running 3D training and my GPUs are spending most of their time at 0% utilization. I'm not sure if this is the expected behavior or if there is something I can do to address this.

Here is my command and log output:

(omnipose) [<user>@oblivion cellpose-image-processing]$ omnipose --use_gpu --train --dir /clusterfs/fiona/segmentation_curation/training_data/combined_dataset/processed/ --mask_filter _masks --n_epochs 4000 --pretrained_model None --learning_rate 0.1 --save_every 50 --save_each  --verbose --dim 3 --RAdam --batch_size 32 --diameter 0 --nclasses 4 --tyx 112,128,128
2023-08-21 20:48:23,322 [INFO] WRITING LOG OUTPUT TO /global/home/users/<user>/.cellpose/run.log
log file /global/home/users/<user>/.cellpose/run.log
2023-08-21 20:48:24,716 [INFO] ** TORCH GPU version installed and working. **
2023-08-21 20:48:24,717 [INFO] >>>> using GPU
Omnipose enabled. See Omnipose repo for licencing details.
2023-08-21 20:48:24,717 [INFO] Training omni model. Setting nclasses=4, RAdam=True
2023-08-21 20:48:25,341 [INFO] not all flows are present, will run flow generation for all images
2023-08-21 20:48:25,907 [INFO] setting nchan to 1. Be sure to use --nchan 1 when running the model.
2023-08-21 20:48:25,907 [INFO] training from scratch
2023-08-21 20:48:25,907 [INFO] median diameter set to 0 => no rescaling during training
2023-08-21 20:49:08,478 [INFO] No precomuting flows with Omnipose. Computed during training.
2023-08-21 20:49:13,629 [WARNING] channels is set to None, input must therefore have nchan channels (default is 2)
2023-08-21 20:49:13,685 [INFO] >>> Using RAdam optimizer
2023-08-21 20:49:13,685 [INFO] >>>> training network with 1 channel input <<<<
2023-08-21 20:49:13,686 [INFO] >>>> LR: 0.10000, batch_size: 32, weight_decay: 0.00001
2023-08-21 20:49:13,686 [INFO] >>>> ntrain = 48
2023-08-21 20:49:13,686 [INFO] >>>> nimg_per_epoch = 48
2023-08-21 20:49:13,802 [INFO] >>>> Start time: 20:49:13

14.40 + 32 + 12.50
Train epoch: 0 | Time: 14.94min | last epoch: 0.00s | <sec/epoch>: 0.00s | <sec/batch>: 437.92s | <Batch Loss>: 12.384790 | <Epoch Loss>: 14.067648
2023-08-21 21:04:10,114 [INFO] saving network parameters to /clusterfs/fiona/segmentation_curation/training_data/combined_dataset/processed/models/cellpose_residual_on_style_on_concatenation_off_omni_nclasses_6_nchan_1_processed_2023_08_21_20_49_13.629596_epoch_0

Train epoch: 2 | Time: 43.94min | last epoch: 880.37s | <sec/epoch>: 859.77s | <sec/batch>: 438.36s | <Batch Loss>: 8.364649 | <Epoch Loss>: 9.324625

And here is my gpu usage:

kevinjohncutler / omnipose

GPU usage low for extended periods of time #55