Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998)

bkline commented 2 years ago

Current versions of Python write a deprecation warning message to stderr, which breaks CGI scripts running under web servers which fold stderr into stdout, and likely breaks other software. This change replaces the deprecated calls with PyUnicode_New(size, maxchar). The accompanying code to populate the new objects has also been rewritten to use the new PyUnicode APIs.

I have included a unit test, but I assume others more familiar with the project will want to decide how to integrate it with the existing tests. I didn't fold the test in with any of the other sets, because the only sets in tests3 which were not DBMS-specific were for tests of the API, and this isn't a fix of API behavior (or rather, it's a fix of internals related to working with the Python APIs). The test fails without the fix and passes with it.

This is my second shot at a PR for this issue. My first attempt took the instructions in the deprecation warning and in the documentation for PyUnicode_New() at face value, but that wasn't sufficient, and the pipeline tests didn't all pass. So I dug deeper into the documentation for all of the PyUnicode APIs and PEP 393 and did some more extensive rewriting of the two parts of pyodbc which were calling PyUnicode_FromUnicode(NULL, size). Assuming this second attempt passes the pipeline tests, I would very much like some very close scrutiny of this patch. I am very confident that my experience with the Python C APIs is much less extensive than the seasoned maintainers for the project, and the documentation is thinner than I would have liked, and downright buggy in places (for example, the documentation claims that PyUnicode_CopyCharacters() returns 0 on success, but I figured out from digging through the Python API's own C code that the function actually returns the number of characters copied; note also the mixup in the argument names—the second to should be from).

int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many)

Copy characters from one Unicode object into another. This function performs character conversion when necessary and falls back to memcpy() if possible. Returns -1 and sets an exception on error, otherwise returns 0.

Fixes #998

gordthompson commented 2 years ago

With increasing adoption of Python 3.10 this will affect more and more people.

@mkleehammer , @v-chojas - Care to have a look?

mkleehammer commented 2 years ago

Wow, lots of great work. Thanks. I'll spend some time looking through this carefully.

mkleehammer commented 2 years ago

I wonder - as nice as this is, would it be better to hurry the v5? A big part of the complexity is dealing with multiple encodings due to 2.7. Python itself can deal with merging the different encodings into a single string trivially when using ''.join(a). We could rely on Python more if we had a single version and possibly if we had the core in Python instead of C.

Thoughts?

bkline commented 2 years ago

I wonder - as nice as this is, would it be better to hurry the v5?

Well, sure. I only opened #998 (and the PRs) because I had never gotten a response to the question I asked last year in #917.

Should I open a separate ticket to get this fixed on a faster track than https://github.com/mkleehammer/pyodbc/issues/945?

Not complaining. We're all busy. 😉

eabase commented 2 years ago

Any reason why this PR is not yet merged?

Is it possible this is also breaking the output of Pandas output when read into a DF from fetching an SQL result?

I'm referring to these SO questions:

alanblyth commented 2 years ago

This is a blocker for me, would be great to see this merged.

mkleehammer / pyodbc

Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998) #1002