tanrj / pyodbc

Automatically exported from code.google.com/p/pyodbc
MIT No Attribution
0 stars 0 forks source link

inconsistent comparison of unicode/non-unicode strings with MySQL / ODBC #25

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
using MySQL with pyodbc, result values from table selects appear to come
back as Python unicode objects (though oddly, not from literal string
selects).   But then it also will fail to compare a unicode value to a
non-unicode value on the server, leading to this contradictory behavior:

    import pyodbc
    c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes")

    cursor = c.cursor()

    # passes
    cursor.execute("drop table foo")
    cursor.execute("create table foo(data varchar(30))")
    cursor.execute("insert into foo values (?)", u'test')
    cursor.execute("select * from foo")
    x = cursor.fetchone()[0]
    cursor.execute("select count(1) from foo where data=?", x)
    assert cursor.fetchone()[0] == 1

    # fails
    cursor.execute("drop table foo")
    cursor.execute("create table foo(data varchar(30))")
    cursor.execute("insert into foo values (?)", 'test')
    cursor.execute("select * from foo")
    x = cursor.fetchone()[0]
    cursor.execute("select count(1) from foo where data=?", x)
    assert cursor.fetchone()[0] == 1

Where in the second test, the value of 'test' in the database does not
compare to that of u'test'.  not sure if theres an ODBC setting I should be
using here.  

Original issue reported on code.google.com by zzz...@gmail.com on 15 Jan 2009 at 12:46

GoogleCodeExporter commented 8 years ago
on further investigation, the problem is not the comparison.  It's just unicode
objects as bind parameters that are busted:

    import pyodbc
    c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes")

    cursor = c.cursor()

    cursor.execute("select ?", u'test')
    print cursor.fetchone()

    cursor.execute("select ?", 'test')
    print cursor.fetchone()

It's just a little rough since result sets return strings as unicode objects, so
round trips have to take an asymmetric approach to bind/result handling.  
Michael
Trier tells me pyodbc with MSSQL doesn't have this issue.  I'm on OSX if that 
matters.

Original comment by zzz...@gmail.com on 15 Jan 2009 at 4:26

GoogleCodeExporter commented 8 years ago
I'll have to install MySQL this weekend and try it.  It definitely does not 
behave
that way on MSSQL; the returned varchar values would be str, not Unicode.

In fact, that is the first thing I would look at.  pyodbc doesn't convert ANSI 
to
Unicode on its own, so the database driver must be returning them as Unicode.  I
believe the MySQL driver has some keywords to control this.

In general, pyodbc never converts parameters -- it passes what you give 
directly to
the driver and tells the driver whether the value is ANSI or Unicode.  

*However*, on OS/X, the Python Unicode data type is 4 bytes while wchar_t is 
only 2
bytes, so there is some conversion going on.  If I remember correctly, I call a
Python API to perform the conversion, perhaps it is incorrect.

To find out for sure, generate a small ODBC trace for the test above and post 
it. 
That will tell me everything I need to know.

Original comment by mkleehammer on 15 Jan 2009 at 8:31

GoogleCodeExporter commented 8 years ago
sure thing....

Original comment by zzz...@gmail.com on 16 Jan 2009 at 4:06

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks for the trace.  Here's what I see:

The first execute definitely binds the parameter as a Unicode string, which we 
can
see by looking at the SQLBindParameter call and noting the data type: 
SQL_C_WCHAR. 
(For those that don't know, 'WCHAR' stands for 'wide character', meaning 
Unicode.) 
The second execute binds using SQL_C_CHAR, so it is ANSI.

These are as I would expect.

What is unexpected is the fetch for both returns ANSI strings, even though the 
first
execute should simply return the Unicode parameter.  One of your previous emails
seemed to say it always returns Unicode, but it looks like it should always 
return a
string instead.  Can you confirm?  (Perhaps print cursor.description and show 
the
results of the print cursor.fetchone())

Perhaps we shouldn't be surprised, however, since this is a strange situation.  
At
the point that the query is compiled, there is no way for the MySQL driver to 
know
the datatype of the parameter you are going to pass in the future.  A better 
test
might be to create a table with a Unicode column and ANSI column and print a 
row from
that.

Unfortunately, both of these issues may be different than the one you opened the
issue for, and that is the parameter in a where clause.  That issue seems to be
database/driver dependent.  The trace you've provided confirms that pyodbc 
passes the
parameters as you provide them: unicode objects are passed as WCHAR and string
objects are passed as CHAR.  My suggestion would be to always pass the datatype 
that
matches the target column.

How is it that they do not match?  Are you using ANSI MySQL tables but Unicode 
in
your Python program?

Original comment by mkleehammer on 16 Jan 2009 at 7:30

GoogleCodeExporter commented 8 years ago
the issue is, if I use a regular old MySQL table, put a regular old string in 
it, I
get back a Python unicode object in the result set.  Using that result as a bind
parameter then fails since unicodes seem to get blown away on the way back in.

Heres a test/trace:

    import pyodbc
    c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes")

    cursor = c.cursor()

    cursor.execute("drop table foo")
    cursor.execute("create table foo(data varchar(30))")
    cursor.execute("insert into foo values (?)", 'test')
    cursor.execute("select * from foo")
    result = cursor.fetchone()[0]
    assert isinstance(result, unicode)

    cursor.execute("select ?", result)
    r2 = cursor.fetchone()[0]
    assert result == r2

Original comment by zzz...@gmail.com on 16 Jan 2009 at 8:12

Attachments:

GoogleCodeExporter commented 8 years ago
Your trace confirms that this is out of my control.  The data *really* does 
come back
as Unicode data:

  [000000.031062]
  Python A0313720 EXIT  SQLDescribeCol with return code 0 (SQL_SUCCESS)
  ...
  SQLSMALLINT     * 0xbfffedea (SQL_WVARCHAR)

There really isn't anything I can do in pyodbc -- there must be some kind of 
server
or driver setting for this.

What version of MySQL are you running and what driver are you using?  You can
determine the driver using:

  cnxn.getinfo(pyodbc.SQL_DRIVER_NAME)
  cnxn.getinfo(pyodbc.SQL_DRIVER_VER)

I'm going to leave this issue open for a bit while we investigate, but I don't 
think
there are any changes I can make -- pyodbc takes the data as it is sent from the
database/driver.

Original comment by mkleehammer on 24 Jan 2009 at 3:55

GoogleCodeExporter commented 8 years ago
I'm going to have to close this since there really isn't anything I can do.  If 
you
found any more information, feel free to post here or at the pyodbc news group 
since
it may help others.

Sorry I couldn't do more.

Original comment by mkleehammer on 18 Mar 2009 at 5:18