mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.92k stars 563 forks source link

pgtests test_emoticons_as_literal fails #1004

Closed bkline closed 2 years ago

bkline commented 2 years ago

Environment

Issue

The test_emoticons_as_literal() test fails for the pgtests.py suite.

======================================================================
FAIL: test_emoticons_as_literal (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 625, in test_emoticons_as_literal
    self.assertEqual(result, v)
AssertionError: 'x � z' != 'x 🌜 z'
- x � z
?   ^
+ x 🌜 z
?   ^

This is similar to the failure reported for another test suite by https://github.com/mkleehammer/pyodbc/issues/536. Interestingly, if I run the tests with the --ansi flag set, this test passes, but the test_output_conversion test fails, because apparently setting ansi=True in the pyodbc.connect() call results for some reason in the return value from the local convert() function not being used (or perhaps that local function isn't invoked at all)

======================================================================
FAIL: test_output_conversion (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 671, in test_output_conversion
    self.assertEqual(value, 'X123.45X')
AssertionError: '123.45' != 'X123.45X'
- 123.45
+ X123.45X
? +      +

However, if I comment out the setencoding() call:

diff --git a/tests3/pgtests.py b/tests3/pgtests.py
index dcb03a9..5dfacc9 100755
--- a/tests3/pgtests.py
+++ b/tests3/pgtests.py
@@ -70,7 +70,7 @@ class PGTestCase(unittest.TestCase):

         # I've set my test database to use UTF-8 which seems most popular.
         self.cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
-        self.cnxn.setencoding(encoding='utf-8')
+        # self.cnxn.setencoding(encoding='utf-8')

         # As of psql 9.5.04 SQLGetTypeInfo returns absurdly small sizes leading
         # to slow writes.  Override them:

... and omit the --ansi flag in the test suite invocation, then all the tests pass (except for the error reported by #1003).

Library: /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
/Users/bkline/repos/pyodbc/tests3 --> /Users/bkline/repos/pyodbc
/Users/bkline/repos/pyodbc/tests3/pgtests.py:519: DeprecationWarning: PyUnicode_FromUnicode(NULL, size) is deprecated; use PyUnicode_New() instead
  result = str(row)
======================================================================
ERROR: test_columns (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 575, in test_columns
    assert row.precision == 3, row.precision
AttributeError: 'pyodbc.Row' object has no attribute 'precision'

----------------------------------------------------------------------
Ran 79 tests in 4.181s

FAILED (errors=1)
gordthompson commented 2 years ago

Try running the tests with the -v switch, e.g,

$ python3 tests3/pgtests.py -v

That will show details about the test environment, specifically the driver name and version

driver:  psqlodbcw.so 12.01.0000

What does it display for you?

bkline commented 2 years ago

Appreciate your following up on these tickets so promptly. Here the verbose output.

% python3 tests3/pgtests.py -t test_emoticons_as_literal -vv
Library: /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
/Users/bkline/repos/pyodbc/tests3 --> /Users/bkline/repos/pyodbc
python:  3.10.1 (main, Dec  6 2021, 23:20:29) [Clang 13.0.0 (clang-1300.0.29.3)]
pyodbc:  4.0.33b6 /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
odbc:    03.52
driver:  psqlodbcw.so 13.02.0000
         supports ODBC version 03.51
os:      Darwin
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_emoticons_as_literal (__main__.PGTestCase) ... FAIL

======================================================================
FAIL: test_emoticons_as_literal (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 625, in test_emoticons_as_literal
    self.assertEqual(result, v)
AssertionError: 'x � z' != 'x 🌜 z'
- x � z
?   ^
+ x 🌜 z
?   ^

----------------------------------------------------------------------
Ran 1 test in 0.066s

FAILED (failures=1)
gordthompson commented 2 years ago

Hmmm. As in the case of #1003 this one works for me with Python 3.10 as well.

$ python3.10 tests3/pgtests.py -t test_emoticons_as_literal -vv
Library: /home/gord/git/pyodbc/build/lib.linux-x86_64-3.10/pyodbc.cpython-310-x86_64-linux-gnu.so
/home/gord/git/pyodbc/tests3 --> /home/gord/git/pyodbc
python:  3.10.1 (main, Dec 21 2021, 17:46:38) [GCC 9.3.0]
pyodbc:  4.0.33b7+test_with_dsn /home/gord/git/pyodbc/build/lib.linux-x86_64-3.10/pyodbc.cpython-310-x86_64-linux-gnu.so
odbc:    03.52
driver:  psqlodbcw.so 12.01.0000
         supports ODBC version 03.51
os:      Linux
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_emoticons_as_literal (__main__.PGTestCase) ... ok

----------------------------------------------------------------------
Ran 1 test in 1.813s

OK
bkline commented 2 years ago

On Windows, too, apparently (given that my recent PR passed all the pipeline tests). So the tests are just broken on macOS (at least with the versions of the drivers and unixODBC that I'm running).

bkline commented 2 years ago

Given the recent comments in #1003 it doesn't seem unreasonable to wonder if the fact that we're running different versions of the psqlodbc driver is a more likely explanation of the discrepancy between our results for this test, than the different operating systems on which we're running. I'll hop onto a Linux system and run some tests to see if I can confirm (or debunk) this theory.

bkline commented 2 years ago

Rolled back psqlodbc temporarily to 12.01.0000 on my macOS system, but the test still fails unless I comment out the line self.cnxn.setencoding(encoding='utf-8') or set the --ansi flag. I did find this bug report which @keitherskine filed with pgsql-bugs, but as far as I can tell his report was ignored.

Testing on a Linux system, also running psqlodbcw.so 12.01.0000, the test passes (matching your results). Also, all of the tests pass on that system with the setencoding() call commented out, and that's true of the macOS system running with the same driver version. I don't have access to a Windows test box, but it might be useful to know how the tests come out on windows without the call to setencoding(). If they pass, it might be tempting to fix the failing test by just letting the tests run with the pyodbc encoding defaults.

It would be even nicer if we understood what the underlying cause of the difference in behavior is when the encoding default is overridden. The only difference I can see (after pointing both systems to the same PostgreSQL server) is that the macOS client is using unixODBC 2.3.9 and the Linux system is running unixODBC 2.3.7. Unfortunately, attempts to build a version of unixODBC which matched the version installed on the other client ran into too much of a dependency swamp to be feasible on either system.

gordthompson commented 2 years ago

On Windows with 12.01.0000:

PS C:\Users\Gord\git\pyodbc> mypython3-64 tests3/pgtests.py -t test_emoticons_as_literal -vv
Library: C:\Users\Gord\git\pyodbc\build\lib.win-amd64-3.9\pyodbc.cp39-win_amd64.pyd
C:\Users\Gord\git\pyodbc\tests3 --> C:\Users\Gord\git\pyodbc
python:  3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
pyodbc:  4.0.33b6 C:\Users\Gord\git\pyodbc\build\lib.win-amd64-3.9\pyodbc.cp39-win_amd64.pyd
odbc:    03.80.0000
driver:  PSQLODBC35W.DLL 12.01.0000
         supports ODBC version 03.51
os:      Windows
unicode: Py_Unicode=2 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
         8.1 6.3.9600 SP0 Multiprocessor Free
test_emoticons_as_literal (__main__.PGTestCase) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.311s

OK
gordthompson commented 2 years ago

On Windows with 12.01.0000 and setencoding() disabled in pgtests.py:

PS C:\Users\Gord\git\pyodbc> mypython3-64 tests3/pgtests.py -t test_emoticons_as_literal -vv
Library: C:\Users\Gord\git\pyodbc\build\lib.win-amd64-3.9\pyodbc.cp39-win_amd64.pyd
C:\Users\Gord\git\pyodbc\tests3 --> C:\Users\Gord\git\pyodbc
python:  3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
pyodbc:  4.0.33b6 C:\Users\Gord\git\pyodbc\build\lib.win-amd64-3.9\pyodbc.cp39-win_amd64.pyd
odbc:    03.80.0000
driver:  PSQLODBC35W.DLL 12.01.0000
         supports ODBC version 03.51
os:      Windows
unicode: Py_Unicode=2 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
         8.1 6.3.9600 SP0 Multiprocessor Free
test_emoticons_as_literal (__main__.PGTestCase) ... FAIL

======================================================================
FAIL: test_emoticons_as_literal (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Gord\git\pyodbc\tests3\pgtests.py", line 625, in test_emoticons_as_literal
    self.assertEqual(result, v)
AssertionError: 'x ?? z' != 'x 🌜 z'
- x ?? z
?   ^^
+ x 🌜 z
?   ^

----------------------------------------------------------------------
Ran 1 test in 0.149s

FAILED (failures=1)
bkline commented 2 years ago

Well, that's a bummer. So with some stacks it fails if you leave the encoding at the default, and with others it fails if you override the default encoding. Makes for a fragile test suite. 😒

bkline commented 2 years ago

I suppose one possibility would be to make the overriding of the default encoding dynamically based on the OS.

bkline commented 2 years ago

Here's a repro of the test (pulled out into a separate script) failing on Linux with Python 3.9.7, pyodbc 4.0.32, and version 13.00.0000 of psqlodbcw.so.

pyodbc-issue1004.tar.gz

tar -xf pyodbc-issue1004.tar.gz
cd issue1004
docker compose build
docker compose up -d
docker logs issue1004-repro-1
docker compose down

Output:

============================================================
with default encoding
============================================================
python:  3.9.7 (default, Sep 10 2021, 14:59:43) [GCC 11.2.0]
pyodbc:  4.0.32
odbc:    03.52
driver:  psqlodbcw.so 13.00.0000
         supports ODBC version 03.51
os:      Linux
unicode: Py_Unicode=4 SQLWCHAR=2
------------------------------------------------------------
'x 🌜 z' matches the expected value 'x 🌜 z'

============================================================
with encoding overridden
============================================================
python:  3.9.7 (default, Sep 10 2021, 14:59:43) [GCC 11.2.0]
pyodbc:  4.0.32
odbc:    03.52
driver:  psqlodbcw.so 13.00.0000
         supports ODBC version 03.51
os:      Linux
unicode: Py_Unicode=4 SQLWCHAR=2
------------------------------------------------------------
'x ð\x9f\x8c\x9c z' does not match the expected value 'x 🌜 z'
bkline commented 2 years ago

I found a Windows machine to test on.

============================================================
with default encoding
============================================================
python:  3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)]
pyodbc:  4.0.32
odbc:    03.80.0000
driver:  PSQLODBC35W.DLL 13.02.0000
         supports ODBC version 03.51
os:      Windows 10 10.0.19041 SP0 Multiprocessor Free
unicode: Py_Unicode=2 SQLWCHAR=2
------------------------------------------------------------
'x 🌜 z' matches the expected value 'x 🌜 z'

============================================================
with encoding overridden
============================================================
python:  3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)]
pyodbc:  4.0.32
odbc:    03.80.0000
driver:  PSQLODBC35W.DLL 13.02.0000
         supports ODBC version 03.51
os:      Windows 10 10.0.19041 SP0 Multiprocessor Free
unicode: Py_Unicode=2 SQLWCHAR=2
------------------------------------------------------------
'x 🌜 z' does not match the expected value 'x 🌜 z'
gordthompson commented 2 years ago

When you try to do the insert from your Mac with .setencoding() and .setdecoding(), e.g.,

cnxn.setdecoding(pyodbc.SQL_CHAR, encoding="utf-8")
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding="utf-8")
cnxn.setencoding(encoding="utf-8")

crsr = cnxn.cursor()

v = "x \U0001F31C z"

crsr.execute("CREATE TABLE t1 (s varchar(100))")
crsr.execute(f"insert into t1 values ('{v}')")

and then check the table from a separate tool like pgAdmin_4 does the inserted value look correct?

Edit:

Both pgAdmin_4 and DBeaver appear to render the column contents as x 🌜 z, which are the correct bytes for UTF-8 encoding but not the Unicode character itself. :frowning_man:

bkline commented 2 years ago

The inserted character is not correct, so the problem happens going into the DB, not coming out. Sort of what I expected, given the semantics of "encode" and "decode" (commenting out just the setencoding() line was sufficient to get the value stored correctly).

#!/usr/bin/env python3

"""Store values with high Unicode characters in PostgreSQL database.
"""

import argparse
import pyodbc

VALUE = "x \U0001F31C z"

parser = argparse.ArgumentParser()
parser.add_argument("--connection-string", "-c", required=True)
opts = parser.parse_args()
conn = pyodbc.connect(opts.connection_string)
conn.setdecoding(pyodbc.SQL_CHAR, encoding="utf-8")
conn.setdecoding(pyodbc.SQL_WCHAR, encoding="utf-8")
conn.setencoding(encoding="utf-8")
cursor = conn.cursor()
for table_name in ("t1", "t2"):
    cursor.execute(f"CREATE TABLE {table_name} (s VARCHAR(100))")
cursor.execute("INSERT INTO t1 VALUES (?)", (VALUE,))
cursor.execute("INSERT INTO t2 VALUES ('{VALUE}')")
conn.commit()
print("done")

image

gordthompson commented 2 years ago

Okay, for me the test passes on both Windows and Linux if I comment out both the setencoding() and setdecoding() in tests3/pgtests.py:

https://github.com/mkleehammer/pyodbc/blob/b1cfb75df79bd21593b0cd7d8002223d172de2ee/tests3/pgtests.py#L72-L73

So if that also passes on Mac then we can just remove those lines.

bkline commented 2 years ago

The only failing test with that modification of the script on macOS is test_columns() (addressed in #1003). It's sort of a nice bonus that the correct behavior across the platforms is achieved using the package's defaults. That feels right. 👍

(I'm learning all sorts of new tricks with these recent tickets. TIL about extending a permalink in GitHub to a range. 😂 )