find_tables OOM - Githubissues

Description of the bug

when I process page.find_tables() of 5 pages async, and the 5 pages are all have figures. the memory will be increased all the time when process other pages with figures until oom

How to reproduce the bug

 async def _parsing_pdf(self,
                    file: Path,
                    ):
    tasks = []
    futures = []
    start_time = time.time()
    document = fitz.open(file)
    for page in document:
        page_number = page.number
        tasks.append(self._process_page(page,file))
        if len(tasks) >= 5:
            completed_futures = await asyncio.gather(*tasks)
            tasks.clear()
            futures.extend(completed_futures)
    completed_futures = await asyncio.gather(*tasks)

async def _process_page(self,page,file_path): page_index = page.number image_list = page.get_images() table_finder = page.find_tables(strategy='lines_strict') table_list = table_finder.tables if len(image_list) == 0 and len(table_list)==0: logger.info(f"page {page_index+1} has no image and table")

PyMuPDF version

1.23.x or earlier

Operating system

MacOS

Python version

3.11

pymupdf / PyMuPDF

find_tables OOM #3607

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version