Closed bkline closed 2 years ago
And for a bonus question: does anyone know why the name for the table schema value is table_schem
instead of table_schema
? I don't see any explanation in the documentation for this odd spelling, nor in the code comments.
I am unable to reproduce your issue.
$ python3 tests3/pgtests.py -t test_columns -vv
Library: /home/gord/git/pyodbc/build/lib.linux-x86_64-3.8/pyodbc.cpython-38-x86_64-linux-gnu.so
/home/gord/git/pyodbc/tests3 --> /home/gord/git/pyodbc
python: 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0]
pyodbc: 4.0.33b6+test_with_dsn /home/gord/git/pyodbc/build/lib.linux-x86_64-3.8/pyodbc.cpython-38-x86_64-linux-gnu.so
odbc: 03.52
driver: psqlodbcw.so 12.01.0000
supports ODBC version 03.51
os: Linux
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_columns (__main__.PGTestCase) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.814s
OK
driver: freetds stable 1.3.6
Really? AFAIK PostgreSQL doesn't speak TDS.
PostgreSQL doesn't speak TDS.
Sorry, that was copy/pasted from another ticket. I have fixed the description above.
% python3 tests3/pgtests.py -t test_columns -vv
Library: /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
/Users/bkline/repos/pyodbc/tests3 --> /Users/bkline/repos/pyodbc
python: 3.10.1 (main, Dec 6 2021, 23:20:29) [Clang 13.0.0 (clang-1300.0.29.3)]
pyodbc: 4.0.33b6 /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
odbc: 03.52
driver: psqlodbcw.so 13.02.0000
supports ODBC version 03.51
os: Darwin
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_columns (__main__.PGTestCase) ... ERROR
======================================================================
ERROR: test_columns (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 575, in test_columns
assert row.precision == 3, row.precision
AttributeError: 'pyodbc.Row' object has no attribute 'precision'
----------------------------------------------------------------------
Ran 1 test in 0.054s
FAILED (errors=1)
All these years using the unittest
module and I never knew about the -t testname
option. 🤦
Okay, so apparently it's not strictly a Python 3.10 thing:
$ python3.10 tests3/pgtests.py -t test_columns -vv
Library: /home/gord/git/pyodbc/build/lib.linux-x86_64-3.10/pyodbc.cpython-310-x86_64-linux-gnu.so
/home/gord/git/pyodbc/tests3 --> /home/gord/git/pyodbc
python: 3.10.1 (main, Dec 21 2021, 17:46:38) [GCC 9.3.0]
pyodbc: 4.0.33b7+test_with_dsn /home/gord/git/pyodbc/build/lib.linux-x86_64-3.10/pyodbc.cpython-310-x86_64-linux-gnu.so
odbc: 03.52
driver: psqlodbcw.so 12.01.0000
supports ODBC version 03.51
os: Linux
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_columns (__main__.PGTestCase) ... ok
----------------------------------------------------------------------
Ran 1 test in 1.719s
OK
Can you run odbcinst -j
to check your version of unixODBC?
The version of unixODBC is 2.3.9.
The documentation says:
columns(table=None, catalog=None, schema=None, column=None)
Creates a result set of column information in the specified tables using the SQLColumns function.
Each row has the following columns:
table_cat
table_schem
table_name
column_name
data_type
type_name
column_size
buffer_length
decimal_digits
num_prec_radix
nullable
remarks
column_def
sql_data_type
sql_datetime_sub
char_octet_length
ordinal_position
is_nullable: One of SQL_NULLABLE, SQL_NO_NULLS, SQL_NULLS_UNKNOWN.
# columns in table x
for row in cursor.columns(table='x'):
print(row.column_name)
Why would the test be written to look for a column which—according to that documentation—shouldn't even be there?
Why would the test be written to look for a column which—according to that documentation—shouldn't even be there?
Good question. Maybe different drivers deviate somewhat from the ODBC spec. It's certainly one of the columns returned by the PostgreSQL driver:
[ODBC][4745][1641328615.293688][SQLDescribeCol.c][504]
Exit:[SQL_SUCCESS]
Column Name = [PRECISION]
Data Type = 0x7ffd53ceae2a -> 4
Column Size = 0x7ffd53ceae30 -> 10
Decimal Digits = 0x7ffd53ceae2c -> 0
Nullable = 0x7ffd53ceae2e -> 1
re: "Maybe different drivers deviate somewhat", above - apparently they do:
[pgtests.py]
row = results['xΏz']
assert row.type_name == 'varchar'
assert row.precision == 4, row.precision
[sqlservertests.py]
row = results['xΏz']
assert row.type_name == 'varchar'
assert row.column_size == 4, row.column_size
Does this mean that the pyodbc
documentation quoted above is only correct for some versions of some drivers?
Does this mean that the
pyodbc
documentation quoted above is only correct for some versions of some drivers?
It seems so. It's probably based on what SQL Server ODBC does. For reference, here are the columns that PostgreSQL ODBC returns (for me, at least):
>>> import pyodbc
>>> cnxn = pyodbc.connect("DRIVER=PostgreSQL Unicode;SERVER=192.168.0.199;UID=scott;PWD=tiger;DATABASE=test", autocommit=True)
>>> crsr = cnxn.cursor()
>>> crsr.execute("CREATE TABLE t (id int primary key)")
<pyodbc.Cursor object at 0x7f6e39da1e30>
>>> col_info = crsr.columns("t")
>>> from pprint import pprint
>>> pprint([x[0] for x in col_info.description])
['table_qualifier',
'table_owner',
'table_name',
'column_name',
'data_type',
'type_name',
'precision',
'length',
'scale',
'radix',
'nullable',
'remarks',
'column_def',
'sql_data_type',
'sql_datetime_sub',
'char_octet_length',
'ordinal_position',
'is_nullable',
'display_size',
'field_type',
'auto_increment',
'physical number',
'table oid',
'base typeid',
'typmod',
'table info']
I would have expected that if pyodbc
is documenting the names of the columns (which seems like the right thing to do, if the method is going to be at all useful), it would be taking care of mapping names used by lower layers to the documented names. That way (among other things) we could be sure that at least for a given version of the package, the documentation and the software would match each other.
I found what seems like a relevant commit for the psqlodbc driver, in which they say they are switching to "ODBC 3 column names for the result set of catalog functions." This commit was made in late August, so if the driver you're running is earlier than that, we may have found the explanation for why you see different results for the tests than I see.
You're probably right. You are using "psqlodbcw.so 13.02.0000" while I'm using "psqlodbcw.so 12.01.0000".
Sure enough, the release you're running (12.01.0000) was tagged almost exactly two years ago. The release I'm running (13..02.0000) was released in September (a month after the commit I linked to above).
It's not 100% clear that I'm reading the pipeline test logs correctly, but if I am the pipeline is running with release 11.01.0000 of psqlodbc, tagged May 2019.
I would have expected that if
pyodbc
is documenting the names of the columns (which seems like the right thing to do, if the method is going to be at all useful), it would be taking care of mapping names used by lower layers to the documented names. That way (among other things) we could be sure that at least for a given version of the package, the documentation and the software would match each other.
The problem there is that ODBC drivers apparently feel free to change names and throw in any number of extra columns that may or may not exist in other implementations. (For example, there are 18 pyodbc-documented columns and psqlodbcw.so 12.01.0000 returns 26 columns.) Trying to document all possible variants would be … tedious. :)
I found what seems like a relevant commit for the psqlodbc driver, in which they say they are switching to "ODBC 3 column names for the result set of catalog functions." This commit was made in late August, so if the driver you're running is earlier than that, we may have found the explanation for why you see different results for the tests than I see.
Excellent detective work there, BTW.
TL;DR: The tests should be verifying that the behavior of the software matches the documentation.
The problem there is that ODBC drivers apparently feel free to change names ....
Absolutely! That's the very reason that pyodbc
should present a well-documented method which exposes as stable a list of properties/names as possible. In this case, the names exposed by ODBC 3 seem like the logical choice. Or, if the currently documented names don't already match the ODBC 3 names, another reasonable option would be to stick with the names currently in the documentation (with annotation to point out how the names exposed by pyodbc
correspond to the ODBC 3 names. Yes, we have absolutely no control over when the lower layers change the names they use. But we can provide a consistent, correctly documented interface to those names/properties.
Trying to document all possible variants would be … tedious. :)
If you thought I was suggesting that we document all of the variants, then I didn't do a good enough job of explaining what I was hoping for. When I wrote "... it [pyodbc
] should be mapping names used by lower layers to the documented names" I meant that the documented names should be the same regardless of what names are used by the drivers. The mapping would happen in the code. For example (in pseudo-code):
ASK THE DRIVER FOR THE COLUMN PROPERTIES
IF THE DRIVER GAVE US A PROPERTY NAMED 'column_size':
MAP THE VALUE TO `column_size`
OTHERWISE, IF THE DRIVER GAVE US A PROPERTY NAMED 'precision':
MAP THE VALUE TO `column_size`
....
If that represents an unacceptably high level of effort (which it may well; this is a volunteer-driven, open-source project, after all), then it would be preferable for the documentation to avoid listing any column names to be found in the Row
objects, and to say instead something along the lines of ...
This package offers a non-standard
columns()
method to provide the properties for each column in a table as exposed by the underlying drivers, using the property names returned by those drivers. It is the responsibility of user code to dynamically discover which names are used for the columns of the returnedRow
objects. For example .... A more standard, fully documented interface for a subset of this information is provided in theCursor
object'sdescription
attribute.
And (assuming this is the road we go down) the unit tests should be testing that description
attribute, not a volatile property whose names can change at the whim of the driver vendors. We absolutely don't want the unit tests to be at the mercy of what the driver vendors do, to the extent we can avoid it.
I would further recommend that all of the methods and properties which are not part of the Python DB-API 2.0 be clearly identified as package-specific extensions.
To be clear, none of the projects I work on use ODBC to talk with PostgreSQL. But when I submit a PR I would like to know that all of the unit tests pass, including DBMS-specific tests for stacks I don't necessarily use myself, when running under code which I have modified. It's not good enough that the tests happen to pass for the stacks used by the CI-pipeline.
Excellent detective work there, BTW.
Thanks! 😅
Confirming that the test_columns
test which fails running with psqlodbcw.so 13.02.0000 passes when I point to a freshly-built psqlodbcw.so 12.01.0000, with everything else in the stack identical.
Environment
HEAD
ofmaster
branchIssue
The
test_columns()
unit test inpgtests.py
tries to accessrow.precision
for eachRow
object in the results set returned by the call tocursor.columns()
. This results in an error for the test.According to the documentation,
precision
is not a column in the rows returned bycursor.columns()
.