pgsql-io / multicorn2

http://multicorn2.org
Other
73 stars 16 forks source link

log_to_postgres(level=ERROR) / PostgreSQL exceptions poorly handled (Python 3.11 support) #58

Closed mfenniak closed 1 month ago

mfenniak commented 1 month ago

I've been trying to add support for Python 3.11 to multicorn2, and struggling with unexpected crashes. I believe I've identified the core of the problem.

When the Python function log_to_postgres is used with the level ERROR, as it is wisely used in some FDWs to report extended information about a problem... for example: https://github.com/pgsql-io/multicorn2/blob/b0a274c3aeec8341f93ec3f8328cad9f105ae4ee/python/multicorn/fsfdw/__init__.py#L286-L291

This executes errstart/errfinish with a level of ERROR, of course... https://github.com/pgsql-io/multicorn2/blob/b0a274c3aeec8341f93ec3f8328cad9f105ae4ee/src/utils.c#L84-L88

And as a result, this begins invoking PostgreSQL's C-based exception handling code, which uses setjmp/longjmp to move execution to the current error handler block (PG_TRY()...PG_CATCH()...PG_END_TRY()). However, the Python interpreter is in the middle of executing a C function call and its state is completely corrupt when control is never returned to the interpreter.

The specific interaction that is breaking right now is:

Apparently in Python 3.10 and earlier the interpreter is able to cope with this, which is somewhat of a minor miracle to me. But in 3.11, by the time we re-enter the interpreter after this abrupt exit, it doesn't know how to cope and it begins to access invalid memory, crashing shortly afterwards.

I think the right fix for this would be to:

This would have the benefit of unwinding the Python stack -- for example any finally blocks and context managers that are in-use in Python would be exited cleanly. And I think it would leave the interpreter in a good clean state for re-entry later.

I think that this would still leave a residual problem which is likely "OK"-ish: any other place that could trigger PostgreSQL exceptions is going to fail to perform Python reference counting correctly. For example any invocation of errorCheck() in the code base today could trigger Postgres's exception handling which will prevent the following Py_DECREF from being invoked. That could lead to memory leaks in any long-running backends using multicorn, but, I think it's not of critical importance.