mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.95k stars 562 forks source link

Strange additional characters in Oracle error messages. #489

Open mpf82 opened 5 years ago

mpf82 commented 5 years ago

Environment

Issue

When I recently upgraded from pyodbc 4.0.17 to 4.0.24 I noticed that I received additional characters in Oracle error messages.

For example the query SELECT FROM dual; (note the missing *) will return ('HY000', '[HY000] [Oracle][ODBC][Ora]ORA-00936: missing expression\n (936) (SQLExecDirectW)') in pyodbc 4.0.17.

(When I say "return" in this context, I mean it is throwing an exception with this message)

In pyodbc 4.0.18 (and also the latest 4.0.24) the same query returns ('HY000', '[HY000] [Oracle][ODBC][Ora]ORA-00936: missing expression\n\x00\x00\x00\x00\x00\x00\x00ᾨգ\x00\x00\x00\x00\x00\x00\uec28昝\x00\x00鋈թ\x00\x00ຂ昘\x00\x00㙐\x00\x00\x00\ueed0¾\x00\x00\ueed4¾\x00\x00\x00\x00\x00\x00㙐\x00 (936) (SQLExecDirectW)')

In the 4.0.18 release notes I have found:

Fix Unicode error messages that were sometimes garbled in the past.

But it seems from this version on, the error messages now contain additional garbage data, at least in Oracle.

I have also tried various combinations of setencoding and setdecoding, but this strange "extra data" was never successfully unscrambled.

AFAICT, everything else (INSERT / UPDATE / DELETE / ...) is working fine.

Edit:

I am not sure if this related to issue https://github.com/mkleehammer/pyodbc/issues/369 because, as you can see, the first part of the error is perfectly readable, and that's basically all there is to the error.

This issue is about extra data that is contained in the exception, that does not seem to be part of the actual error message.

v-chojas commented 5 years ago

Please post an ODBC trace for more information.

mpf82 commented 5 years ago

pyodbc 4.0.17:

test_pyodbc3    4734-3d24   ENTER SQLExecDirectW 
        HSTMT               0x00000000053F26C0
        WCHAR *             0x0000000004F401B8 [      16] "select from dual"
        SDWORD                    16

test_pyodbc3    4734-3d24   EXIT  SQLExecDirectW  with return code -1 (SQL_ERROR)
        HSTMT               0x00000000053F26C0
        WCHAR *             0x0000000004F401B8 [      16] "select from dual"
        SDWORD                    16

        DIAG [HY000] [Oracle][ODBC][Ora]ORA-00936: Ausdruck fehlt
 (936) 

test_pyodbc3    4734-3d24   ENTER SQLGetDiagRecW 
        SQLSMALLINT                  3 <SQL_HANDLE_STMT>
        SQLHANDLE           0x00000000053F26C0
        SQLSMALLINT                  1 
        SQLWCHAR *          0x0000000000FEF290
        SQLINTEGER *        0x0000000000FEF7A8
        SQLWCHAR *          0x00000000053F2D10 
        SQLSMALLINT               1023 
        SQLSMALLINT *       0x0000000000FEF7A0

test_pyodbc3    4734-3d24   EXIT  SQLGetDiagRecW  with return code 0 (SQL_SUCCESS)
        SQLSMALLINT                  3 <SQL_HANDLE_STMT>
        SQLHANDLE           0x00000000053F26C0
        SQLSMALLINT                  1 
        SQLWCHAR *          0x0000000000FEF290 [       5] "HY000"
        SQLINTEGER *        0x0000000000FEF7A8 (936)
        SQLWCHAR *          0x00000000053F2D10 [      90] "[Oracle][ODBC][Ora]ORA-00936: Ausdruck fehlt\ a\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0"
        SQLSMALLINT               1023 
        SQLSMALLINT *       0x0000000000FEF7A0 (90)

test_pyodbc3    4734-3d24   ENTER SQLGetDiagRecW 
        SQLSMALLINT                  3 <SQL_HANDLE_STMT>
        SQLHANDLE           0x00000000053F26C0
        SQLSMALLINT                  2 
        SQLWCHAR *          0x0000000000FEF290
        SQLINTEGER *        0x0000000000FEF7A8
        SQLWCHAR *          0x00000000053F2D10 
        SQLSMALLINT               1023 
        SQLSMALLINT *       0x0000000000FEF7A0

test_pyodbc3    4734-3d24   EXIT  SQLGetDiagRecW  with return code 100 (SQL_NO_DATA_FOUND)

pyodbc 4.0.24:

test_pyodbc3    3468-2fb0   ENTER SQLExecDirectW 
        HSTMT               0x00000000053926C0
        WCHAR *             0x0000000004EC0D88 [      16] "select from dual"
        SDWORD                    16

test_pyodbc3    3468-2fb0   EXIT  SQLExecDirectW  with return code -1 (SQL_ERROR)
        HSTMT               0x00000000053926C0
        WCHAR *             0x0000000004EC0D88 [      16] "select from dual"
        SDWORD                    16

        DIAG [HY000] [Oracle][ODBC][Ora]ORA-00936: Ausdruck fehlt
 (936) 

test_pyodbc3    3468-2fb0   ENTER SQLGetDiagRecW 
        SQLSMALLINT                  3 <SQL_HANDLE_STMT>
        SQLHANDLE           0x00000000053926C0
        SQLSMALLINT                  1 
        SQLWCHAR *          0x0000000000FEEA50
        SQLINTEGER *        0x0000000000FEEA48
        SQLWCHAR *          0x0000000000FEEA60 
        SQLSMALLINT               1023 
        SQLSMALLINT *       0x0000000000FEF2B8

test_pyodbc3    3468-2fb0   EXIT  SQLGetDiagRecW  with return code 0 (SQL_SUCCESS)
        SQLSMALLINT                  3 <SQL_HANDLE_STMT>
        SQLHANDLE           0x00000000053926C0
        SQLSMALLINT                  1 
        SQLWCHAR *          0x0000000000FEEA50 [       5] "HY000"
        SQLINTEGER *        0x0000000000FEEA48 (936)
        SQLWCHAR *          0x0000000000FEEA60 [      90] "[Oracle][ODBC][Ora]ORA-00936: Ausdruck fehlt\ a\ 0\ 0\ 0\ 0\ 0\ 0\ 0??\ 0\ 0\ 0\ 0\ 0\ 0??\ 0\ 0??\ 0\ 0??\ 0\ 0?\ 0\ 0\ 0?\ff\ 0\ 0?\ff\ 0\ 0\ 0\ 0\ 0\ 0?\ 0"
        SQLSMALLINT               1023 
        SQLSMALLINT *       0x0000000000FEF2B8 (90)

test_pyodbc3    3468-2fb0   ENTER SQLGetDiagRecW 
        SQLSMALLINT                  3 <SQL_HANDLE_STMT>
        SQLHANDLE           0x00000000053926C0
        SQLSMALLINT                  2 
        SQLWCHAR *          0x0000000000FEEA50
        SQLINTEGER *        0x0000000000FEEA48
        SQLWCHAR *          0x0000000000FEEA60 
        SQLSMALLINT               1023 
        SQLSMALLINT *       0x0000000000FEF2B8

test_pyodbc3    3468-2fb0   EXIT  SQLGetDiagRecW  with return code 100 (SQL_NO_DATA_FOUND)

Please let me know if you need any other information from the ODBC trace.

v-chojas commented 5 years ago

This appear to be a bug in the ODBC driver, not pyODBC. The error message is 45 characters long, or 90 bytes. The driver is returning 90 in SQLGetDiagRecW, which is specified to be in number of characters and not bytes: https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdiagrec-function Thus pyODBC correctly takes 90 characters, resulting in the extra characters at the end.

mpf82 commented 5 years ago

But I use the exact same setup in both cases, the only thing different is the pyodbc version.

As you can see, in 4.0.17 the error message is "[Oracle][ODBC][Ora]ORA-00936: Ausdruck fehlt\ a\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0"

and in 4.0.24 the error message is "[Oracle][ODBC][Ora]ORA-00936: Ausdruck fehlt\ a\ 0\ 0\ 0\ 0\ 0\ 0\ 0??\ 0\ 0\ 0\ 0\ 0\ 0??\ 0\ 0??\ 0\ 0??\ 0\ 0?\ 0\ 0\ 0?\ff\ 0\ 0?\ff\ 0\ 0\ 0\ 0\ 0\ 0?\ 0"

Why would the same driver return different values for different pyodbc versions?

v-chojas commented 5 years ago

The data past the actual end of the error message is undefined.

mpf82 commented 5 years ago

It may be undefined, but it is always reproducible.

No problems with pyodbc 4.0.17 and lower, and always these extra characters with pyodbc 4.0.18 and higher.

So something must have changed from 4.0.18 onwards, right?

v-chojas commented 5 years ago

The ODBC trace shows that the driver is returning an incorrect length for the message, which is twice of what it should be. The fact that previous versions of pyODBC produced a correct result should be considered nothing more than coincidence, as any small change in your configuration could cause the results to change.

pannetron commented 5 years ago

IMHO this is pyodbc mishandling the error string after 4.0.18 as I have the same issue as described above. We're seeing this in python3.7.1, pyodbc 4.0.24, on 64bit Linux with Oracle 12 DB. This worked perfectly with python 3.3.2 and pyodbc 3.0.7. All use the same version of unixODBC, 2.3.0.

v-chojas commented 5 years ago

To repeat again, this is a bug in the ODBC driver which is returning an incorrect length.

mkleehammer commented 5 years ago

The difference is that in 4.0.17 we used SQLGetDiagRec but for Unicode error message support, which is a critical item, we switched to SQLGetDiagRecW (notice the 'W' on the end for "wide").

I have to agree with v-chojas here and say that the Oracle driver seems to have a bug. The SQLGetDiagRec docs are pretty clear that the length should be in characters, not bytes.

@pannetron Can you open an issue with Oracle and ask about this?

pannetron commented 5 years ago

Michael,

A HUGE thank you for the details of the changes made after 4.0.17!

At CU Boulder's Laboratory for Atmospheric and Space Physics, we operate several spacecraft for NASA, some quite old. It's a BIG deal for us to change our foundation software platforms because we must avoid upsetting things that work. For this reason we seldom upgrade to recent packages and we're on some admittedly pretty dated Oracle driver software. We must also support multiple platforms including Solaris, multiple flavors of Linux, and several programming languages in addition to python. For those reasons we've elected to avoid updating Oracle drivers to our Oracle databases.

I will take an action item to investigate how the error strings are returned from the driver and if the Oracle bug still exists in their current driver versions. Do you think that the thin shell of unixODBC might also be suspect? The version of Oracle drivers we use requires the use of an older, compatible version of unixODBC. We realize that the "correct" solution is to update everything and that's on our strategic plan, it's just going require years in our environment!

If we revert to pyodbc 4.0.17, will we miss out on significant bug fixes or just new features that our older drivers probably don't support anyway? I'll look into change logs in the pyodbc project to try to answer my own question.

Again, thank you for the detail in your first sentence!

Cheers,

Russ Panneton russell.panneton@lasp.colorado.edu Operations Software Laboratory for Atmospheric and Space Physics University of Colorado Boulder +1-303-492-7618

On Thu, Dec 13, 2018 at 8:32 PM Michael Kleehammer notifications@github.com wrote:

The difference is that in 4.0.17 we used SQLGetDiagRec but for Unicode error message support, which is a critical item, we switched to SQLGetDiagRecW (notice the 'W' on the end for "wide").

I have to agree with v-chojas here and say that the Oracle driver seems to have a bug. The SQLGetDiagRec https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdiagrec-function?view=sql-server-2017 docs are pretty clear that the length should be in characters, not bytes.

@pannetron https://github.com/pannetron Can you open an issue with Oracle and ask about this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mkleehammer/pyodbc/issues/489#issuecomment-447201936, or mute the thread https://github.com/notifications/unsubscribe-auth/ARvMk7veCFzamISZEUefJea7l_nMn-ffks5u4xu7gaJpZM4YvVQ- .

mkleehammer commented 5 years ago

Driver bugs, or sometimes purposefully ignoring the spec, have been a constant over pyodbc's lifetime and I've tried to work around them where possible. For example, the text-encoding settings are all to work around the fact that the ODBC specification tells us the encodings that should be used, but for performance many drivers use UTF-8.

Sometimes the products even have flags - note this one from IBM

Some types of behavior, like the one here, are very easy to work around with a flag. I'd be happy to do so but want to quickly see if anyone has any suggestions on a reasonable interface for these types of flags. Should we use just a flags bitfield or should we plan on more complicated settings? If so, should each flag be a property so we can change the types?

# default
cnxn.compat_diagrec = None

# length is in bytes, but Unicode
cnxn.compat_diagrec = 'byte-length'

# ASCII, so we know length is in bytes
cnxn.compat_diagrec = 'ascii'

I like the idea of having a common prefix to emphasize these are deviations from normal behavior. Is there a better prefix?

cnxn.compat_diagrec = 'byte-length'
cnxn.workaround_diagrec = 'byte-length'
cnxn.fix_diagrec = 'byte-length'
cnxn.driverbug_diagrec = 'byte-length'

@gordthompson @v-chojas What do you guys think? Even when we can get driver issues fixed, there are environments where a quick workaround is handy.

v-chojas commented 5 years ago

@pannetron if 4.0.17 is working fine for you and you are in legacy environment unwilling to upgrade everything else, then there is no reason to change it. Do note that unixODBC 2.3.0 is very, very buggy (see the changelog at http://www.unixodbc.org/ --- search for SQLGetDiagFieldW) and I would not be surprised if it is the cause here. Unfortunately if your driver is using 2.3.0 it can't use any later version because unixODBC changed its ABI between 2.3.0 (which shares the same ABI as the 2.2.x) and 2.3.1+.

@mkleehammer unless there is a strong need for someone with an old buggy driver+DM to use a new pyODBC, I would hold off on introducing yet more complexity in those codepaths. If anything, introducing an option to switch diag recs between ANSI/Unicode might be better in this case and in general. With combinations of bugs in the DM and driver you may get anywhere from 1/8 to 8x the expected length value, and it might even depend on which characters are present in the string.

pannetron commented 5 years ago

In case other folks can benefit from this, I'm reporting that we installed pyodbc 4.0.17 in our python 3.7.1 environment and now we're good to go. For us, this is an effective and acceptable near-term workaround.

dplynskiy commented 5 years ago

Hi. Have same problem with python3.7.3, pyodbc 4.0.24 and unixODBC 2.3.7 How i can fix it?

v-chojas commented 5 years ago

As discussed above, the bug is in the ODBC driver --- owned by Oracle, not pyODBC.

dplynskiy commented 5 years ago

Right. But how about compat_diagrec flag realization? it's will be nice.

reedjosh commented 4 years ago

Yes please! I would love a compat flag or the like. ATM, there isn't much of a workaround as trying to catch and better parse the issue only works sometimes. Since pyodbc tries to decode the message and it sometimes has non-encodable characters. So sometimes I catch pyodbc.Error and sometimes UnicodeDecoeError which doesn't seem to have the original message.

reedjosh commented 4 years ago

At the least if I could get back a raw message in some way I can easily enough reduce the length by two. Unfortunately on occasion the UnicodeDecodeError is thrown and I no longer have access to the original message. : /

reedjosh commented 4 years ago

Can confirm that removing the *2 that is *sizeof(ODBCCHAR) from the below errors.cpp snippet works around this as I just removed it and installed from source. So, how can we get this info into errors.cpp? I'm not great with cpp and know even less about cpython, but I'd love to help and submit a pull request.

        // Note that this will not work if the DM is using a different wide encoding (e.g. UTF-32).
        const char *unicode_enc = conn ? conn->metadata_enc.name : ENCSTR_UTF16NE;
        Object msgStr(PyUnicode_Decode((char*)szMsg, cchMsg * sizeof(ODBCCHAR), unicode_enc, "strict"));

        if (cchMsg != 0 && msgStr.Get())

@mkleehammer Regarding how the flags should be done. I would say something like a sub-object.

so not just

cnxn.compat_diagrec = 'byte-length'
cnxn.workaround_diagrec = 'byte-length'
cnxn.fix_diagrec = 'byte-length'
cnxn.driverbug_diagrec = 'byte-length'

but

cnxn.compat.diagrec = 'byte-length'
cnxn.workaround.diagrec = 'byte-length'
cnxn.fix.diagrec = 'byte-length'
cnxn.driverbug.diagrec = 'byte-length'

That way whatever name is decided on to indicate this is a fix for what someone outside of pyodbc broke, that name can be a single attribute that is carried around instead of a bunch of one off attributes.

Otherwise, we could also... 1). Use env vars. (not my favorite, but simple and easy to remove if the drivers ever get fixed. 2). Check the current driver version and apply a fix automagically. (I like, but I could see this being too automagical.)

Either way, let's do this! : )

v-chojas commented 4 years ago

@reedjosh What DM are you using? If unixODBC, you should try to upgrade that first.

reedjosh commented 4 years ago

@v-chojas ... I have upgraded to the latest. Compiled unixODBC from source. This bug is still present. Simply rebuilding pyodbc with the *sizeof(ODBCCHAR) removed from: https://github.com/mkleehammer/pyodbc/blob/master/src/errors.cpp#L295 solves this, but I'd prefer a workaround option instead of modifying source. : /

Further, config options would be useful for issues like https://github.com/mkleehammer/pyodbc/blob/master/src/errors.cpp#L268

NicoB77 commented 2 years ago

As a quick fix, wouldn't it be good to change the error handling and replace "strict" in


// For now, default to UTF-16 if this is not in the context of a connection.
// Note that this will not work if the DM is using a different wide encoding (e.g. UTF-32).
const char *unicode_enc = conn ? conn->metadata_enc.name : ENCSTR_UTF16NE;
Object msgStr(PyUnicode_Decode((char*)szMsg, cchMsg * sizeof(ODBCCHAR), unicode_enc, "strict"));

with e.g. "replace"? Or at least create an option to do so? It is quite unpleasant that decoding errors in error message raise a SystemError.