Don't crash in SQL_ASCII DBs in PyString_AsString

mfenniak commented 2 months ago

When a PostgreSQL database is running in SQL_ASCII encoding, anything in multicorn that touches PyString_AsString or PyString_AsStringAndSize will encounter an encoding error because "SQL_ASCII" isn't a valid Python encoding; and then the errorCheck() function will attempt to get the error that occurred, call reportException, which calls PyString_AsString, resulting in infinite recursion until the process segfaults.

It seems that there's already a workaround in the form of the getPythonEncodingName name which translates SQL_ASCII to "ascii", used in some functions, but not in these ones; so I've applied those here.

Reproduction steps:

Create a DB with initdb -E SQL_ASCII

Run the a schema import from TestForeignDataWrapper, or run a DELETE command, for example:

CREATE EXTENSION multicorn;
CREATE SERVER multicorn_srv foreign data wrapper multicorn options (
wrapper 'multicorn.testfdw.TestForeignDataWrapper'
);
CREATE FOREIGN TABLE testmulticorn (
test1 character varying,
test2 character varying
) server multicorn_srv options (
option1 'option1'
);
DELETE FROM testmulticorn;

Behavior before this patch: segfault; when debugged this codepath can be found (for the schema import scenario, if I remember correctly):

#130945 0x00007fd6ff0fe4be in errorCheck () at src/errors.c:31
#130946 0x00007fd6ff0fe6b2 in PyString_AsString (unicode=unicode@entry=0x7fd6fe8efbf0) at src/python.c:173
#130947 0x00007fd6ff0fe294 in reportException (pErrType=<optimized out>, pErrValue=<optimized out>, pErrTraceback=<optimized out>) at src/errors.c:50
... repeat a few thousand times ...
#130948 0x00007fd6ff0fe4be in errorCheck () at src/errors.c:31
#130949 0x00007fd6ff0fe6b2 in PyString_AsString (unicode=unicode@entry=0x7fd6fe8efc70) at src/python.c:173
#130950 0x00007fd6ff0fe294 in reportException (pErrType=<optimized out>, pErrValue=<optimized out>, pErrTraceback=<optimized out>) at src/errors.c:50
#130951 0x00007fd6ff0fe4be in errorCheck () at src/errors.c:31
#130952 0x00007fd6ff0fe6b2 in PyString_AsString (unicode=unicode@entry=0x7fd6fe8e78c0) at src/python.c:173
#130953 0x00007fd6ff0fe294 in reportException (pErrType=<optimized out>, pErrValue=<optimized out>, pErrTraceback=<optimized out>) at src/errors.c:50
#130954 0x00007fd6ff0fe4be in errorCheck () at src/errors.c:31
#130955 0x00007fd6ff0fe7fb in getClass (className=className@entry=0x7fd6fe8a4450) at src/python.c:210
#130956 0x00007fd6ff0fe9b5 in getClassString (className=className@entry=0x558da58a7820 "dynamodbfdw.dynamodbfdw.DynamoFdw") at src/python.c:310
#130957 0x00007fd6ff1042f1 in multicorn_validator (fcinfo=<optimized out>) at src/multicorn.c:225
#130958 0x0000558da50c715a in FunctionCall2Coll ()
#130959 0x0000558da50c780d in OidFunctionCall2Coll ()
#130960 0x0000558da4de1c9d in transformGenericOptions ()
#130961 0x0000558da4de27b9 in CreateForeignServer ()
#130962 0x0000558da4fa3c2c in ProcessUtilitySlow.isra.0 ()
#130963 0x0000558da4fa2ce0 in standard_ProcessUtility ()
#130964 0x0000558da4fa154f in PortalRunUtility ()
#130965 0x0000558da4fa166b in PortalRunMulti ()
#130966 0x0000558da4fa1ac1 in PortalRun ()
#130967 0x0000558da4f9e1a8 in exec_simple_query ()
#130968 0x0000558da4fa047a in PostgresMain ()
#130969 0x0000558da4f1f801 in ServerLoop ()
#130970 0x0000558da4f207eb in PostmasterMain ()
#130971 0x0000558da4ca4c8a in main ()

Behavior after: both an import and a delete works without segfault, at least.

ShaheedHaque commented 2 months ago

I've not tested this, but am happy to merge in a few days unless vetoed by @luss.

Question: let's say your efforts under #44 were merged. What would a new test based on the notes above look like? (This may be better answered in the PR for #44 to avoid blocking this PR).

luss commented 2 months ago

I thnk it looks great and very much welcome shaheed doing the needful

On Sat, May 11, 2024 at 2:52 AM ShaheedHaque @.***> wrote:

I've not tested this, but am happy to merge in a few days unless vetoed by @luss https://github.com/luss.

Question: let's say your efforts under #44 https://github.com/pgsql-io/multicorn2/discussions/44 were merged. What would a new test based on the notes above look like?

— Reply to this email directly, view it on GitHub https://github.com/pgsql-io/multicorn2/pull/43#issuecomment-2105600672, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHSOM63LWFCBDNQESH3ZBW52ZAVCNFSM6AAAAABHP2DMW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVGYYDANRXGI . You are receiving this because you were mentioned.Message ID: @.***>

mfenniak commented 2 months ago

Question: let's say your efforts under #44 were merged. What would a new test based on the notes above look like? (This may be better answered in the PR for #44 to avoid blocking this PR).

In #46 I've had to code the test DB to run under "UTF8" in order to avoid this bug -- https://github.com/pgsql-io/multicorn2/blob/b50d02dd2a565708771e35d928dbd9676f187709/Makefile#L143. So a regression test to cover this case would either be removing that hack 🤣, or even better, running the test suites under a small collection of targeted/supported encodings.

I'll likely remove that in #46 after merging this.

Based upon the positive reviews and having addressed the C local comment, I'm going to make use of @luss's newly granted merge rights to go ahead with this. Thanks!

pgsql-io / multicorn2

Don't crash in SQL_ASCII DBs in PyString_AsString #43