PostgreSQL unit test tries to access non-existent 'precision' attribute

bkline commented 2 years ago

Environment

Python: 3.10.1
pyodbc: built from current HEAD of master branch
OS: macOS 11.6.2 (for client) Debian 10.2.1-6 (for server)
DB: PostgreSQL 14.1
driver: psqlodbcw.so 13.02.0000 (supports ODBC version 03.51)
unixODBC 2.3.9

Issue

The test_columns() unit test in pgtests.py tries to access row.precision for each Row object in the results set returned by the call to cursor.columns(). This results in an error for the test.

======================================================================
ERROR: test_columns (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 575, in test_columns
    assert row.precision == 3, row.precision
AttributeError: 'pyodbc.Row' object has no attribute 'precision'

According to the documentation, precision is not a column in the rows returned by cursor.columns().

bkline commented 2 years ago

And for a bonus question: does anyone know why the name for the table schema value is table_schem instead of table_schema? I don't see any explanation in the documentation for this odd spelling, nor in the code comments.

gordthompson commented 2 years ago

I am unable to reproduce your issue.

$ python3 tests3/pgtests.py -t test_columns -vv
Library: /home/gord/git/pyodbc/build/lib.linux-x86_64-3.8/pyodbc.cpython-38-x86_64-linux-gnu.so
/home/gord/git/pyodbc/tests3 --> /home/gord/git/pyodbc
python:  3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0]
pyodbc:  4.0.33b6+test_with_dsn /home/gord/git/pyodbc/build/lib.linux-x86_64-3.8/pyodbc.cpython-38-x86_64-linux-gnu.so
odbc:    03.52
driver:  psqlodbcw.so 12.01.0000
         supports ODBC version 03.51
os:      Linux
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_columns (__main__.PGTestCase) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.814s

OK

driver: freetds stable 1.3.6

Really? AFAIK PostgreSQL doesn't speak TDS.

bkline commented 2 years ago

PostgreSQL doesn't speak TDS.

Sorry, that was copy/pasted from another ticket. I have fixed the description above.

% python3 tests3/pgtests.py -t test_columns -vv
Library: /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
/Users/bkline/repos/pyodbc/tests3 --> /Users/bkline/repos/pyodbc
python:  3.10.1 (main, Dec  6 2021, 23:20:29) [Clang 13.0.0 (clang-1300.0.29.3)]
pyodbc:  4.0.33b6 /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
odbc:    03.52
driver:  psqlodbcw.so 13.02.0000
         supports ODBC version 03.51
os:      Darwin
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_columns (__main__.PGTestCase) ... ERROR

======================================================================
ERROR: test_columns (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 575, in test_columns
    assert row.precision == 3, row.precision
AttributeError: 'pyodbc.Row' object has no attribute 'precision'

----------------------------------------------------------------------
Ran 1 test in 0.054s

FAILED (errors=1)

bkline commented 2 years ago

All these years using the unittest module and I never knew about the -t testname option. 🤦

gordthompson commented 2 years ago

Okay, so apparently it's not strictly a Python 3.10 thing:

$ python3.10 tests3/pgtests.py -t test_columns -vv
Library: /home/gord/git/pyodbc/build/lib.linux-x86_64-3.10/pyodbc.cpython-310-x86_64-linux-gnu.so
/home/gord/git/pyodbc/tests3 --> /home/gord/git/pyodbc
python:  3.10.1 (main, Dec 21 2021, 17:46:38) [GCC 9.3.0]
pyodbc:  4.0.33b7+test_with_dsn /home/gord/git/pyodbc/build/lib.linux-x86_64-3.10/pyodbc.cpython-310-x86_64-linux-gnu.so
odbc:    03.52
driver:  psqlodbcw.so 12.01.0000
         supports ODBC version 03.51
os:      Linux
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_columns (__main__.PGTestCase) ... ok

----------------------------------------------------------------------
Ran 1 test in 1.719s

OK

Can you run odbcinst -j to check your version of unixODBC?

bkline commented 2 years ago

The version of unixODBC is 2.3.9.

bkline commented 2 years ago

The documentation says:

columns(table=None, catalog=None, schema=None, column=None)
Creates a result set of column information in the specified tables using the SQLColumns function.

Each row has the following columns:

table_cat
table_schem
table_name
column_name
data_type
type_name
column_size
buffer_length
decimal_digits
num_prec_radix
nullable
remarks
column_def
sql_data_type
sql_datetime_sub
char_octet_length
ordinal_position
is_nullable: One of SQL_NULLABLE, SQL_NO_NULLS, SQL_NULLS_UNKNOWN.

# columns in table x
for row in cursor.columns(table='x'):
    print(row.column_name)

Why would the test be written to look for a column which—according to that documentation—shouldn't even be there?

gordthompson commented 2 years ago

Why would the test be written to look for a column which—according to that documentation—shouldn't even be there?

Good question. Maybe different drivers deviate somewhat from the ODBC spec. It's certainly one of the columns returned by the PostgreSQL driver:

[ODBC][4745][1641328615.293688][SQLDescribeCol.c][504]
        Exit:[SQL_SUCCESS]                
            Column Name = [PRECISION]                
            Data Type = 0x7ffd53ceae2a -> 4                
            Column Size = 0x7ffd53ceae30 -> 10                
            Decimal Digits = 0x7ffd53ceae2c -> 0                
            Nullable = 0x7ffd53ceae2e -> 1

re: "Maybe different drivers deviate somewhat", above - apparently they do:

[pgtests.py]

        row = results['xΏz']
        assert row.type_name == 'varchar'
        assert row.precision == 4, row.precision

[sqlservertests.py]

        row = results['xΏz']
        assert row.type_name == 'varchar'
        assert row.column_size == 4, row.column_size

bkline commented 2 years ago

Does this mean that the pyodbc documentation quoted above is only correct for some versions of some drivers?

gordthompson commented 2 years ago

Does this mean that the pyodbc documentation quoted above is only correct for some versions of some drivers?

It seems so. It's probably based on what SQL Server ODBC does. For reference, here are the columns that PostgreSQL ODBC returns (for me, at least):

>>> import pyodbc
>>> cnxn = pyodbc.connect("DRIVER=PostgreSQL Unicode;SERVER=192.168.0.199;UID=scott;PWD=tiger;DATABASE=test", autocommit=True)
>>> crsr = cnxn.cursor()
>>> crsr.execute("CREATE TABLE t (id int primary key)")
<pyodbc.Cursor object at 0x7f6e39da1e30>
>>> col_info = crsr.columns("t")
>>> from pprint import pprint
>>> pprint([x[0] for x in col_info.description])
['table_qualifier',
 'table_owner',
 'table_name',
 'column_name',
 'data_type',
 'type_name',
 'precision',
 'length',
 'scale',
 'radix',
 'nullable',
 'remarks',
 'column_def',
 'sql_data_type',
 'sql_datetime_sub',
 'char_octet_length',
 'ordinal_position',
 'is_nullable',
 'display_size',
 'field_type',
 'auto_increment',
 'physical number',
 'table oid',
 'base typeid',
 'typmod',
 'table info']

bkline commented 2 years ago

I would have expected that if pyodbc is documenting the names of the columns (which seems like the right thing to do, if the method is going to be at all useful), it would be taking care of mapping names used by lower layers to the documented names. That way (among other things) we could be sure that at least for a given version of the package, the documentation and the software would match each other.

I found what seems like a relevant commit for the psqlodbc driver, in which they say they are switching to "ODBC 3 column names for the result set of catalog functions." This commit was made in late August, so if the driver you're running is earlier than that, we may have found the explanation for why you see different results for the tests than I see.

gordthompson commented 2 years ago

You're probably right. You are using "psqlodbcw.so 13.02.0000" while I'm using "psqlodbcw.so 12.01.0000".

bkline commented 2 years ago

Sure enough, the release you're running (12.01.0000) was tagged almost exactly two years ago. The release I'm running (13..02.0000) was released in September (a month after the commit I linked to above).

bkline commented 2 years ago

It's not 100% clear that I'm reading the pipeline test logs correctly, but if I am the pipeline is running with release 11.01.0000 of psqlodbc, tagged May 2019.

gordthompson commented 2 years ago

I would have expected that if pyodbc is documenting the names of the columns (which seems like the right thing to do, if the method is going to be at all useful), it would be taking care of mapping names used by lower layers to the documented names. That way (among other things) we could be sure that at least for a given version of the package, the documentation and the software would match each other.

The problem there is that ODBC drivers apparently feel free to change names and throw in any number of extra columns that may or may not exist in other implementations. (For example, there are 18 pyodbc-documented columns and psqlodbcw.so 12.01.0000 returns 26 columns.) Trying to document all possible variants would be … tedious. :)

I found what seems like a relevant commit for the psqlodbc driver, in which they say they are switching to "ODBC 3 column names for the result set of catalog functions." This commit was made in late August, so if the driver you're running is earlier than that, we may have found the explanation for why you see different results for the tests than I see.

Excellent detective work there, BTW.

bkline commented 2 years ago

TL;DR: The tests should be verifying that the behavior of the software matches the documentation.

The problem there is that ODBC drivers apparently feel free to change names ....

Absolutely! That's the very reason that pyodbc should present a well-documented method which exposes as stable a list of properties/names as possible. In this case, the names exposed by ODBC 3 seem like the logical choice. Or, if the currently documented names don't already match the ODBC 3 names, another reasonable option would be to stick with the names currently in the documentation (with annotation to point out how the names exposed by pyodbc correspond to the ODBC 3 names. Yes, we have absolutely no control over when the lower layers change the names they use. But we can provide a consistent, correctly documented interface to those names/properties.

Trying to document all possible variants would be … tedious. :)

If you thought I was suggesting that we document all of the variants, then I didn't do a good enough job of explaining what I was hoping for. When I wrote "... it [pyodbc] should be mapping names used by lower layers to the documented names" I meant that the documented names should be the same regardless of what names are used by the drivers. The mapping would happen in the code. For example (in pseudo-code):

ASK THE DRIVER FOR THE COLUMN PROPERTIES
IF THE DRIVER GAVE US A PROPERTY NAMED 'column_size':
    MAP THE VALUE TO `column_size`
OTHERWISE, IF THE DRIVER GAVE US A PROPERTY NAMED 'precision':
    MAP THE VALUE TO `column_size`
....

If that represents an unacceptably high level of effort (which it may well; this is a volunteer-driven, open-source project, after all), then it would be preferable for the documentation to avoid listing any column names to be found in the Row objects, and to say instead something along the lines of ...

This package offers a non-standard columns() method to provide the properties for each column in a table as exposed by the underlying drivers, using the property names returned by those drivers. It is the responsibility of user code to dynamically discover which names are used for the columns of the returned Row objects. For example .... A more standard, fully documented interface for a subset of this information is provided in the Cursor object's description attribute.

And (assuming this is the road we go down) the unit tests should be testing that description attribute, not a volatile property whose names can change at the whim of the driver vendors. We absolutely don't want the unit tests to be at the mercy of what the driver vendors do, to the extent we can avoid it.

I would further recommend that all of the methods and properties which are not part of the Python DB-API 2.0 be clearly identified as package-specific extensions.

To be clear, none of the projects I work on use ODBC to talk with PostgreSQL. But when I submit a PR I would like to know that all of the unit tests pass, including DBMS-specific tests for stacks I don't necessarily use myself, when running under code which I have modified. It's not good enough that the tests happen to pass for the stacks used by the CI-pipeline.

Excellent detective work there, BTW.

Thanks! 😅

bkline commented 2 years ago

Confirming that the test_columns test which fails running with psqlodbcw.so 13.02.0000 passes when I point to a freshly-built psqlodbcw.so 12.01.0000, with everything else in the stack identical.

mkleehammer / pyodbc

PostgreSQL unit test tries to access non-existent 'precision' attribute #1003

Environment

Issue