microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
MIT License
2.03k stars 233 forks source link

Memory leak in using table detection #103

Open Srivishnu27feb opened 1 year ago

Srivishnu27feb commented 1 year ago

Hi Team, I have a pdf of 20 pages and I am using pdf2image library to convert to images and passing each image for detection using threads and I could see the memory gets piled up and is not getting deallocated unless the entire application is exited... I tried debugging the code just when using table detection or table structure det the memory gets piledup.. in my flask application i am loading the model once and reusing in my function for infering each pages of pdf and also tried gc.collect() and del the variables but no luck. Currently i am using the huggingface implemtation.. Is there any work around that could help in release the memory?

balajiChundi commented 5 months ago

I am facing the similar issue, memory gets allocated during requests and later the memory doesn't get deallocated which is making the application to crash eventually. I performed memory profiling also while sending in multiple requests, the application is at a constant memory usage prior the request stream and after the stream. But the memory that the application is holding on to has increased (please check the attached image.) plot_predict_tatr