mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.92k stars 561 forks source link

"The text, ntext, and image data types cannot be compared or sorted, …" #1058

Closed eabase closed 2 years ago

eabase commented 2 years ago

When reading separate columns using the following DB structure and code, I get error when the column is of the type TEXT.

The table schema type for that is:

TABLE_NAME            COLUMN_NAME        DATA_TYPE      CHARACTER_MAXIMUM_LENGTH
----------            -----------        ---------      ------------------------
mytable  col_name  text  2147483647

I'm then trying to reading all the columns (one at a time) using:

def get_data(table_list):
    cursor = connect()
    for table, column in table_list:
        SQLA = 'SELECT TOP 1 [{}] as data FROM [{}] ORDER BY [{}] DESC;'.format(column, table, column)
        cursor.execute(SQLA)

        while 1:
            row = cursor.fetchone()
            if not row:
                break
            print(' Data : {}'.format(row.data))
    db_close(cursor)

This works fine until it hits the first column with a TEXT type (as shown above).

Running the same query from powershell CMD yields the same result:

$ iex "$qcmd 'SELECT TOP 1 [notes] FROM [mytable ] ORDER BY [notes] DESC;' -W"

Msg 306, Level 16, State 2, Server DBSQL01, Line 1
The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.

Environment

[INFO]  Getting MS SQL Server Version info...
 ------------------------------------------------------------
 Microsoft SQL Server 2008 R2 (SP3-GDR) (KB4057113) - 10.50.6560.0 (X64)
 Dec 28 2017 15:03:48
 Copyright (c) Microsoft Corporation
 Enterprise Edition (64-bit) on Windows NT 6.2 <X64> (Build 9200: ) (Hypervisor)

Issue

Observed behavior:

I get the following error trying to read MS SQL columns of the type TEXT.

pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]
The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator. (306) (SQLExecDirectW)')

Expected behavior:

That it works without errors.

v-chojas commented 2 years ago

This is by design. The error comes from the server. It has nothing to do with pyodbc. The text type is very limited in what can be done in terms of processing it on the server side, hence why it is considered a deprecated type. I realise it is not always possible for you to change the server but varchar(max) is the recommended replacement and does not have these limitations.

eabase commented 2 years ago

@v-chojas Thanks for quick reply, but I'm working with some very old MS SQL DB's so I have no option to modify the DB's. Since the data is just some short (<200 chars) byte strings, there surely must be some kind of workaround for this?

It would also be interesting to understand on the more technical level, why this happens. What is the problem with the TEXT fields?

v-chojas commented 2 years ago

You can try cast(col_name as varchar(max))

The text type is one of the early large object (LOB) types, it is stored and processed differently by the server than the regular types.

eabase commented 2 years ago

@v-chojas

cast(col_name as varchar(max)) Great! But how do I put that into the query?

v-chojas commented 2 years ago

Where the column name is used.

eabase commented 2 years ago

I managed to get around this with:

SQLA = 'SELECT TOP 1 [{}] AS data FROM [{}] ORDER BY cast([{}] as varchar(MAX)) DESC;'.format(column, table, column,column)

Now there is a similar issue with the IMAGE and NVARCHAR types. IMAGE is shown as a 0x2F prefixed HEX blob, while NVARCHAR is referring to the content of a text file. From other documentation NVARCHAR is a 2-byte Unicode format, so one should be able to convert is somehow.

Others say:

NTEXT data types are used for variable-length of Unicode data, TEXT data types are used for variable-length non-Unicode data while IMAGE data types are used for variable-length binary data.

Or from this SO answer:

Starting with SQL Server 2012 (11.x), when a Supplementary Character (SC) enabled collation is used, these data types store the full range of Unicode character data and use the UTF-16 character encoding. If a non-SC collation is specified, then these data types store only the subset of character data supported by the UCS-2 character encoding

v-chojas commented 2 years ago

Yes, ntext is UTF-16 and image is plain binary. You can cast them to nvarchar(max) and varbinary(max) respectively.