python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

Exception ignored in tp_clear of: <class 'memoryview'> #110408

Open wg-postalgia opened 1 year ago

wg-postalgia commented 1 year ago

Bug report

Bug description:

Searching this error the only reference I find was a previous conversation where this issue was considered theoretical and not reproducible.

I made a small script which can reproduce the error. It's really frustrating to do, because the error isn't caught by the debugger, so it's really hard to figure out what was triggering the error, since the Traceback is seemingly random. I slowly pruned my code away to leave this minimal case.

The main code iterates through a PDF file, implemented as an iterator that returns the next page of the PDF (via poppler) as an image at a given DPI. For each page, a Future is created with ProcessPoolExecutor which delays and returns.

Both the iterator through poppler AND the futures are required.

I have solved the problem in my code by removing the iterator, but spent the time to create this reproduction in case it helps someone track down this bug.

Prerequisite: https://pdf2image.readthedocs.io/en/latest/installation.html

import functools
import threading
import time
from pdf2image import convert_from_path, pdfinfo_from_path
from concurrent.futures import ProcessPoolExecutor

class Pdf2ImageIterator:

    def __init__(self, pdf_path: str, poppler_path: str, dpi_high: int):
        self._path = pdf_path
        self._dpi_high = dpi_high
        self._poppler_path = poppler_path
        self.count = pdfinfo_from_path(self._path, None, None, poppler_path=self._poppler_path)["Pages"]

    def __iter__(self):
        self._page = 1
        return self

    def __next__(self):
        try:
            page_h = convert_from_path(self._path, poppler_path=self._poppler_path, dpi=self._dpi_high,
                                        first_page=self._page, last_page=self._page)
            self._page += 1
            return page_h
        except Exception as e:
            print(f"Pdf2Images {e}")

def my_done_callback(i, image, future):
    print(f"my_done_callback on {threading.get_ident()}")

def my_future(i, image):
    print(f"my_future on {threading.get_ident()}")
    time.sleep(1)
    return 0

if __name__ == '__main__':
    ppe = ProcessPoolExecutor(None)

    poppler_path = r"C:\U<YOUR-PATH>\poppler-23.01.0\Library\bin"
    pdf = r"<PDF file with multiple pages>"
    pages = Pdf2ImageIterator(pdf, poppler_path, 200)

    for i, im in enumerate(pages):
        future = ppe.submit(my_future, i, im)
        future.add_done_callback(functools.partial(my_done_callback, i, im))

    print("Done")

Example console logs:

Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
  File "...\plugins\python-ce\helpers\pydev\pydevd_tracing.py", line 56, in _internal_set_trace
    filename = frame.f_back.f_code.co_filename.lower()
BufferError: memoryview has 1 exported buffer

Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
  File "...Programs\Python\Python310\lib\threading.py", line 568, in set
    with self._cond:
BufferError: memoryview has 1 exported buffer

Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
  File "...\plugins\python-ce\helpers\pydev\pydevd_tracing.py", line 56, in _internal_set_trace
    filename = frame.f_back.f_code.co_filename.lower()
BufferError: memoryview has 1 exported buffer

Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)] on win32

CPython versions tested on:

3.10

Operating systems tested on:

Windows

gaogaotiantian commented 1 year ago

Are you able to reproduce this without the debugger (using pure python script in cmdline like python yourscript.py)? I'm assuming you are using PyCharm or some other IDEs because the exception was raised from pydev.

wg-postalgia commented 1 year ago

I am using PyCharm and cannot reproduce without the debugger. Does that mean it's a debugger bug? I'm not sure where to re-make this report to, any suggestions? Thank you

gaogaotiantian commented 1 year ago

I believe this is a dup of #77894. I think this is still a CPython issue, the debugger should not be able to do anything to crash Python. However, without a simple repro that involves "only CPython", it's not easy to track down the issue. It requires plenty of work here to understand the problem.

wg-postalgia commented 1 year ago

Agreed, yes that's the original conversation I referenced - sorry I didn't have the correct migrated link. Happy to close this as duplicate and leave for anyone who feels like investigating this in the future?

It's kind of fun to reproduce a bug that was only theoretical - at least now that I have a fix anyways :)

cyc1111111111 commented 1 year ago

同意,是的,这就是我引用的原始对话 - 抱歉,我没有正确的迁移链接。 很高兴将其作为重复项关闭并留给任何想要将来调查此问题的人吗?

重现一个理论上的错误是很有趣的 - 至少现在我已经修复了:)

Have you solved this problem? I had the same problem

wg-postalgia commented 1 year ago

I thought my problem was solved, but it seems to be a timing issue and the problem still occurs randomly. When it does, the debugger throws the exception and all of the ProcessPool processes have to be stopped manually with Task Manager.

I have stopped using the debugger which does completely resolve the issue. For debugging, you can try to set the number of CPUs to 1, that seemed to work for me.

On Tue, Oct 24, 2023 at 8:39 AM cyc1111111111 @.***> wrote:

同意,是的,这就是我引用的原始对话 - 抱歉,我没有正确的迁移链接。 很高兴将其作为重复项关闭并留给任何想要将来调查此问题的人吗?

重现一个理论上的错误是很有趣的 - 至少现在我已经修复了:)

Have you solved this problem? I had the same problem

— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/110408#issuecomment-1777126610, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5L4UIAQMXTK5KMEBFUIRATYA6ZIFAVCNFSM6AAAAAA5USFJE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZXGEZDMNRRGA . You are receiving this because you authored the thread.Message ID: @.***>

ForeverFerret commented 9 months ago
ProcessPoolExecutor || multiprocessing.Pool

谢谢您! 您的回答对我同样有帮助! 经我测试, 我使用的Pycharm专业版2023.3.2版本会有这个问题, 我又接着尝试了2022.2.5和2022.1.4也都有同样的问题! 但使用VSCode调试则完全没有问题, 所以我怀疑是pycharm对dev做了特别的处理导致了问题

dnparadice commented 8 months ago

+1 this is annoying, debugger should handle multiprocessing more elegantly

tobiaswuerth commented 5 months ago

+1