Batch image detection – CUDA out of memory

foresthayes commented 11 months ago

Hello,

I am running into an issue when using the demo code to process larger datasets (e.g., > 2000 images).

Specifically, the batch running image detection function results in a runtime error: "CUDA out of memory".

https://github.com/microsoft/CameraTraps/blob/e9edc7c05525a7cc5ab39ed62bf9c0770813fc9b/demo/image_demo.py#L65-L66

Do you have any advice? Thanks!

Full error message

``` 20%|████████████████ | 62/313 [01:13<04:58, 1.19s/it] --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[5], line 12 5 dataset = pw_data.DetectionImageFolder( 6 tgt_folder_path, 7 transform=pw_trans.MegaDetector_v5_Transform(target_size=detection_model.IMAGE_SIZE, 8 stride=detection_model.STRIDE) 9 ) 10 loader = DataLoader(dataset, batch_size=32, shuffle=False, 11 pin_memory=True, num_workers=8, drop_last=False) ---> 12 results = detection_model.batch_image_detection(loader) File ~\.conda\envs\pytorch-wildlife\lib\site-packages\PytorchWildlife\models\detection\yolov5\base_detector.py:143, in YOLOV5Base.batch_image_detection(self, dataloader, conf_thres, id_strip) 141 imgs, paths, sizes = batch 142 imgs = imgs.to(self.device) --> 143 total_preds.append(self.model(imgs)[0]) 144 total_paths.append(paths) 145 total_img_sizes.append(sizes) File ~\.conda\envs\pytorch-wildlife\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs) 1098 # If we don't have any hooks, we want to skip the rest of the logic in 1099 # this function, and just call forward. 1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1101 or _global_forward_hooks or _global_forward_pre_hooks): -> 1102 return forward_call(*input, **kwargs) 1103 # Do not call functions when jit is used 1104 full_backward_hooks, non_full_backward_hooks = [], [] File ~\.conda\envs\pytorch-wildlife\Lib\site-packages\yolov5\models\yolo.py:135, in Model.forward(self, x, augment, profile, visualize) 133 if augment: 134 return self._forward_augment(x) # augmented inference, None --> 135 return self._forward_once(x, profile, visualize) File ~\.conda\envs\pytorch-wildlife\Lib\site-packages\yolov5\models\yolo.py:158, in Model._forward_once(self, x, profile, visualize) 156 if profile: 157 self._profile_one_layer(m, x, dt) --> 158 x = m(x) # run 159 y.append(x if m.i in self.save else None) # save output 160 if visualize: File ~\.conda\envs\pytorch-wildlife\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs) 1098 # If we don't have any hooks, we want to skip the rest of the logic in 1099 # this function, and just call forward. 1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1101 or _global_forward_hooks or _global_forward_pre_hooks): -> 1102 return forward_call(*input, **kwargs) 1103 # Do not call functions when jit is used 1104 full_backward_hooks, non_full_backward_hooks = [], [] File ~\.conda\envs\pytorch-wildlife\Lib\site-packages\yolov5\models\common.py:47, in Conv.forward(self, x) 46 def forward(self, x): ---> 47 return self.act(self.bn(self.conv(x))) File ~\.conda\envs\pytorch-wildlife\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs) 1098 # If we don't have any hooks, we want to skip the rest of the logic in 1099 # this function, and just call forward. 1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1101 or _global_forward_hooks or _global_forward_pre_hooks): -> 1102 return forward_call(*input, **kwargs) 1103 # Do not call functions when jit is used 1104 full_backward_hooks, non_full_backward_hooks = [], [] File ~\.conda\envs\pytorch-wildlife\lib\site-packages\torch\nn\modules\batchnorm.py:168, in _BatchNorm.forward(self, input) 161 bn_training = (self.running_mean is None) and (self.running_var is None) 163 r""" 164 Buffers are only updated if they are to be tracked and we are in training mode. Thus they only need to be 165 passed when the update should occur (i.e. in training mode when they are tracked), or when buffer stats are 166 used for normalization (i.e. in eval mode when buffers are not None). 167 """ --> 168 return F.batch_norm( 169 input, 170 # If buffers are not to be tracked, ensure that they won't be updated 171 self.running_mean 172 if not self.training or self.track_running_stats 173 else None, 174 self.running_var if not self.training or self.track_running_stats else None, 175 self.weight, 176 self.bias, 177 bn_training, 178 exponential_average_factor, 179 self.eps, 180 ) File ~\.conda\envs\pytorch-wildlife\lib\site-packages\torch\nn\functional.py:2282, in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps) 2279 if training: 2280 _verify_batch_size(input.size()) -> 2282 return torch.batch_norm( 2283 input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled 2284 ) RuntimeError: CUDA out of memory. Tried to allocate 3.91 GiB (GPU 0; 23.99 GiB total capacity; 11.06 GiB already allocated; 148.00 MiB free; 21.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ```

aa-hernandez commented 11 months ago

Hi @foresthayes!

Thank you for submitting this issue. The batch size that you are defining in your loader might be too high for your GPU's memory. Try reducing it and let us know if it works!

foresthayes commented 11 months ago

Thanks @aa-hernandez,

I have tired using different batch sizes (going as low as 1) and end up with fundamentally the same error.

Monitoring GPU memory usage using nvidia-smi shows it starting at ~3500MiB when the batch is initiated and that memory usage continues to increase as more images are processed until the GPU's memory is exceeded.

zhmiao commented 11 months ago

Hello @foresthayes , this sounds like a memory leak. Could you share some information of your hardware? We want to see if we can reproduce this on our end. Thank you very much!

foresthayes commented 11 months ago

Hi @zhmiao, here is my hardware information; let me know if you need any additional details. Thank you for looking into this!

Processor: Intel i7-13700K Ram: 64gb DDR5 GPU: GeForce RTX 4090 (MSI Suprim Liquid X 24G)

GPU Driver Version: 531.61
GPU CUDA version: 12.1
CUDA Toolkit version: 11.3

zhmiao commented 11 months ago

Hello @foresthayes , thank you very much for sharing the information. We found the possible reason that is causing the issue (https://github.com/pytorch/pytorch/issues/13246). It seems using numpy with pytorch dataset almost always causes memory leak. We are working on replacing the yolo data transform functions that rely on numpy for the dataset and will give you an update as soon as possible.

zhmiao commented 10 months ago

Hello @foresthayes, we have updated our datasets with only PIL and PyTorch functions in our newest commits. The memory is pretty stable in our tests. Could you please take a look at the new code and see if the memory leak is still happening on your end? Thanks!

foresthayes commented 10 months ago

Hello @zhmiao, thank you update and fix! The memory leak is resolved on my end with the updated code (79dc6058e26c6888aa3f4892dc8d116c0004a221).

Thanks again for the help!

zhmiao commented 10 months ago

Hello @zhmiao, thank you update and fix! The memory leak is resolved on my end with the updated code (79dc605).

Thanks again for the help!

Thank you very much for the confirmation @foresthayes! Glad that worked! If you have any other questions, please let us know!

microsoft / CameraTraps

Batch image detection – CUDA out of memory #390