OMG Look at my CPU memory usage increase insanely. Memory leak issue.

sanjay-nit commented 1 month ago

Bug description

I'm running docTR in Google colab GPU(T4). GPU memory usage is constant but cpu usage is increasing insanely when run it in a loop for list of images.

FYI: I've already tried fixing with https://github.com/mindee/doctr/discussions/1422 but didn't help.

I'm using doctr pytorch !pip install "python-doctr[torch]"

I'm using below envs

import os
import torch

os.environ['USE_TF'] = '0'
os.environ['USE_TORCH'] = '1'
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

os.environ["DOCTR_MULTIPROCESSING_DISABLE"] = "TRUE"
os.environ["ONEDNN_PRIMITIVE_CACHE_CAPACITY"] = "1"

I've also tried https://github.com/felixdittrich92/OnnxTR?tab=readme-ov-file but facing issues as it is not using GPU.

Code snippet to reproduce the bug

from doctr.io import DocumentFile
from doctr.models import ocr_predictor
import torch

with torch.no_grad():
    doctr_model = ocr_predictor(
        pretrained=True
    ).cuda()

image_paths = [] # around 500 of images
for image_path in image_paths:
    document = DocumentFile.from_images(image_path)
    result = doctr_model(document)
    json_response = result.export()

Error traceback

CPU memory usage increases insanely and crashes.

Environment

DocTR version: v0.8.1 TensorFlow version: 2.15.0 PyTorch version: 2.2.1+cu121 (torchvision 0.17.1+cu121) OpenCV version: 4.8.0 OS: Ubuntu 22.04.3 LTS Python version: 3.10.12 Is CUDA available (TensorFlow): Yes Is CUDA available (PyTorch): Yes CUDA runtime version: 12.2.140 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 535.104.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.6 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.6

Additional information

torch 2.2.1+cu121 torchaudio 2.2.1+cu121 torchdata 0.7.1 torchsummary 1.5.1 torchtext 0.17.1 torchvision 0.17.1+cu121

Deep Learning backend

is_tf_available: False is_torch_available: True

sanjay-nit commented 1 month ago

After running it in LambdaLabs instance look at the Memory usage.

Can anybody help me with this?

felixdittrich92 commented 1 month ago

@sanjay-nit solved ? :)

sanjay-nit commented 1 month ago

@sanjay-nit solved ? :)

Hi @felixdittrich92 After a whole day of debugging finally I found there was issue with garbage collector. 😃 So sorry I tagged you many times.

Basically I was using TemporaryDirectory() and just for testing I loaded 400 PIL images in list, and somehow after getting out of context manager gc was not able to clear the memory. I still don't know why it happened though I manually deleted all the variable and list but gc wasn't able to clear the memory.

finally I didn't store all the images in list rather I did ocr as one by one then it worked.

Thanks a lot @felixdittrich92 and sorry :)

felixdittrich92 commented 1 month ago

@sanjay-nit solved ? :)

Hi @felixdittrich92 After a whole day of debugging finally I found there was issue with garbage collector. 😃 So sorry I tagged you many times.

Basically I was using TemporaryDirectory() and just for testing I loaded 400 PIL images in list, and somehow after getting out of context manager gc was not able to clear the memory. I still don't know why it happened though I manually deleted all the variable and list but gc wasn't able to clear the memory.

finally I didn't store all the images in list rather I did ocr as one by one then it worked.

Thanks a lot @felixdittrich92 and sorry :)

At the end good to see that you was able to solve it 👍 :)

mindee / doctr