jakesylvestre / pyodbc

Automatically exported from code.google.com/p/pyodbc
MIT No Attribution
0 stars 0 forks source link

pyobc 3.0.6-beta01 + osx 64 bit + freetds 0.91 returns blank string for multibyte unicode #247

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Still can't really use Freetds 0.91 with pyodbc + OSX.   The latest pyodbc also 
doesn't work anymore with FreeTDS 0.82, but I know we'd like to get off that 
someday anyway so we'll skip that for now.

With 0.91, I at least can pass a u'' string as a bound value to execute without 
getting "Invalid data type".   However, if the value contains non-ascii 
characters, now we get a blank u'' string back.

# coding: utf-8

import pyodbc
print pyodbc.version

unicodedata = u"Alors vous imaginez ma surprise, au lever du jour, "\
            u"quand une drôle de petite voix m’a réveillé. Elle "\
            u"disait: « S’il vous plaît… dessine-moi un mouton! »"

conn = pyodbc.connect(dsn="ms_2005", user="scott", password="tiger")

cursor = conn.cursor()

cursor.execute("""
create table uni_round (
    data nvarchar(500)
)
""")

cursor.execute("""
    insert into uni_round (data) values (?)
""", (unicodedata.encode('utf-8'),))

cursor.execute("select data from uni_round")
result = cursor.fetchone()[0]
# here, result is u''
assert result == unicodedata, result

classics-MacBook-Pro:sqlalchemy classic$ python test.py
3.0.6-beta01
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    assert result == unicodedata, result
AssertionError

The freetds.conf file of course has "client charset = UTF-8" as always.

Original issue reported on code.google.com by zzz...@gmail.com on 14 Mar 2012 at 6:02

GoogleCodeExporter commented 9 years ago
Sorry, that .encode() wasn't intended, though the result is the same.  Take out 
the encode(), same result:

# coding: utf-8

import pyodbc
print pyodbc.version

unicodedata = u"Alors vous imaginez ma surprise, au lever du jour, "\
            u"quand une drôle de petite voix m’a réveillé. Elle "\
            u"disait: « S’il vous plaît… dessine-moi un mouton! »"

conn = pyodbc.connect(dsn="ms_2005", user="scott", password="tiger")

cursor = conn.cursor()

cursor.execute("""
create table uni_round (
    data nvarchar(500)
)
""")

cursor.execute("""
    insert into uni_round (data) values (?)
""", (unicodedata,))

cursor.execute("select data from uni_round")
result = cursor.fetchone()[0]
assert result == unicodedata, result

Original comment by zzz...@gmail.com on 14 Mar 2012 at 6:08

GoogleCodeExporter commented 9 years ago
The freetdstests.py unit tests pass using the following:

* OS/X 10.8 (Mountain Lion)
* SQL Server 2012 Express on Windows 7
* Default Apple Python
* FreeTDS 0.91, compiled from source
* pyodbc 3.0.7-beta08

I don't believe there are any changes since 3.0.6 that would have fixed 
anything related.

I also added the following test and it passed:

    def test_unicode2(self):
        """
        From Google Code Issue 247.  (Replaced the smart quotes and elipsis)
        """
        value = u"""Alors vous imaginez ma surprise, au lever du jour,
                    quand une drôle de petite voix m'a réveillé. Elle
                    disait: « S'il vous plaît... dessine-moi un mouton! »"""
        self.cursor.execute("create table t1(s nvarchar(500))")
        self.cursor.execute("insert into t1 values(?)", value)
        v = self.cursor.execute("select * from t1").fetchone()[0]
        self.assertEqual(type(v), unicode)
        self.assertEqual(v, value)

Are you still having problems?

Original comment by mkleehammer on 27 Sep 2012 at 10:14

GoogleCodeExporter commented 9 years ago

Original comment by mkleehammer on 29 Sep 2012 at 4:59

GoogleCodeExporter commented 9 years ago
thanks.  I'll have to get the time to install 0.91 again and get everything 
going, but if you are not seeing the issue on your end, that's encouraging.   
is your test using "nvarchar" as the type for the column ?

Original comment by zzz...@gmail.com on 29 Sep 2012 at 5:10

GoogleCodeExporter commented 9 years ago
still having issues, I get back a string, but the encoding is wrong:

- Python 2.7.3  built from source, as well as Python 3.3.0 built from source
- OSX mountain lion
- FreeTDS 0.91
- Pyodbc 3.0.7-beta10
- Freetds.conf has:

        [ms_2005]
        host = 172.16.248.128
        port = 1213
        tds version = 8.0
        client charset = UTF8
        text size = 50000000

Looking at PDB this is what I'm currently seeing for 2.7 (the assertion doesn't 
print anything for some reason):

(Pdb) !result
u'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xc3\xb4le de 
petite voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait: \xc2\xab 
S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton! \xc2\xbb'
(Pdb) !unicodedata
u'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xf4le de 
petite voix m\u2019a r\xe9veill\xe9. Elle disait: \xab S\u2019il vous 
pla\xeet\u2026 dessine-moi un mouton! \xbb'

I get a similar result for 3.3 (the assertion error prints):

AssertionError: Alors vous imaginez ma surprise, au lever du jour, quand une 
dr\xc3\xb4le de petite voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait: 
\xc2\xab S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton! 
\xc2\xbb 

!= 

Alors vous imaginez ma surprise, au lever du jour, quand une dr\xf4le de petite 
voix m\u2019a r\xe9veill\xe9. Elle disait: \xab S\u2019il vous pla\xeet\u2026 
dessine-moi un mouton! \xbb

Original comment by zzz...@gmail.com on 2 Apr 2013 at 10:29

GoogleCodeExporter commented 9 years ago
yeah I'm trying every flag there is, here's some other detail:

- the Python builds are 64 bit
- I'm using iODBC, not unixodbc, version 3.52.7

the value coming back from FreeTDS is clearly already utf-8 encoded.  If I try 
to force "UCS2" or "UCS4" in the freetds.conf file, the whole program just 
crashes:

Assertion failed: (0), function tds7_send_login, file login.c, line 905.
Abort trap: 6

if you leave client encoding out, then freetds defaults to iso-8859-1, and as 
expected I get an encoded iso-8859-1 string inside the u'' instead of a utf-8.

Original comment by zzz...@gmail.com on 5 Apr 2013 at 4:25

GoogleCodeExporter commented 9 years ago
just tried the built-in Apple Python, getting the same result.

Original comment by zzz...@gmail.com on 5 Apr 2013 at 4:28

GoogleCodeExporter commented 9 years ago
OK researching my iodbc setup, I think I have 3.52.6 and 3.52.7 both installed, 
will try to reconcile which is in use.

Original comment by zzz...@gmail.com on 5 Apr 2013 at 4:35

GoogleCodeExporter commented 9 years ago
3.52.6

Original comment by zzz...@gmail.com on 5 Apr 2013 at 4:48

GoogleCodeExporter commented 9 years ago
I'm just beginning to understand the source here, and I believe you've 
mentioned earlier, pyodbc assumes that data being returned is in UCS-2 format.  
And interestingly, when I run this script on a Fedora platform with unixodbc 
and freetds 0.91, I get the correct result.  Looking in the source, I don't see 
pyodbc doing anything at all with encodings - it is moving the data straight 
from what SQLGetData() gives it into a Python Unicode object, though I don't 
yet understand the buffering logic going on.

The strange thing here is that, per FreeTDS's documentation here: 
http://freetds.schemamania.org/userguide/localization.htm, this shouldn't work 
at all - you will always be getting the data either as UTF-8, or ISO-88590-1 
(the default), unless you set UCS-2 in freetds.conf.  Which does not work 
either on OSX or on Linux, you get a core dump.

Admitting that I'm still totally in the dark here, it seems like FreeTDS + 
UnixODBC on linux is not actually honoring "client encoding" whereas FreeTDS + 
iODBC on OSX is, hence on OSX I get UTF-8 shoved into a u'' string.

Original comment by zzz...@gmail.com on 5 Apr 2013 at 10:33

GoogleCodeExporter commented 9 years ago
also supporting this, if I use an inadequate encoding, like WINDOWS-1251, on 
OSX I get: u'dr?le m\x92a r?veill?', on Linux I still get the full string - 
"client charset" is somehow having no effect on linux (unless I change it to a 
"broken" encoding, like UCS-2 or UTF-16 - then it core dumps).

Original comment by zzz...@gmail.com on 6 Apr 2013 at 7:43

GoogleCodeExporter commented 9 years ago
OK I've now tested this Pyodbc against the following test:

# coding: utf-8

import imp
pyodbc = imp.load_dynamic("pyodbc", 
"build/lib.macosx-10.4-x86_64-2.7/pyodbc.so")

unicodedata = u"drôle m’a réveillé."

conn = pyodbc.connect(u"DSN=ms_2005;UID=scott;PWD=tiger")

cursor = conn.cursor()

cursor.execute("select ?", (unicodedata, ))
result = cursor.fetchone()[0]
print "original data:        %r" % unicodedata
print "received from pyodbc: %r" % result

All on OSX, FreeTDS 0.91:

Result on iODBC 3.52.6:

classics-MacBook-Pro:pyodbc classic$ python test.py
original data:        u'dr\xf4le m\u2019a r\xe9veill\xe9.'
received from pyodbc: u'dr\xc3\xb4le m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9.'

Result on iODBC 3.52.7, 3.52.8 on master (these are via various tags at 
https://github.com/openlink/iODBC/tree/develop/iodbc), as well as unixODBC 
2.3.1 (for each build, I tested pyodbc.so with otool -L to ensure it built to 
the correct library):

original data:        u'dr\xf4le m\u2019a r\xe9veill\xe9.'
received from pyodbc: u''

What's going on in all those others is that the driver isn't handling the u'' 
string at all, if I change it to u'hi' I get this:

classics-MacBook-Pro:pyodbc classic$ python test.py
original data:        u'hi'
received from pyodbc: u'\ufffd\x00'

What freetds.log shows in all the non-working cases that isn't in the 3.52.6 
log is this, right before it attempts to send the statement along with the 
bound parameter:

17:54:26.627963 34615 (util.c:331):tdserror(0x1003a3480, 0x1003c37f0, 2402, 0)
17:54:26.627968 34615 (odbc.c:2270):msgno 2402 20003
17:54:26.627973 34615 (util.c:361):tdserror: client library returned 
TDS_INT_CANCEL(2)
17:54:26.627978 34615 (util.c:384):tdserror: returning TDS_INT_CANCEL(2)

This test seems to illustrate an issue at least with sending the string, and 
possibly receiving it as well.

Original comment by zzz...@gmail.com on 6 Apr 2013 at 10:14

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Running the tests2/freetdstests.py causes a core dump for me if I keep the 
encoding on UTF-8 in freetds.conf, one of the tests is doing something it 
doesn't like.  For the test_unicode2 you have above, it fails:

classics-MacBook-Pro:pyodbc classic$ python tests2/freetdstests.py 
"DSN=ms_2005;UID=scott;PWD=tiger" -t test_unicode2
python:  2.7.3 (default, Feb 14 2013, 14:25:59) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)]
pyodbc:  3.0.7-beta10 
/usr/local/src/pyodbc/build/lib.macosx-10.4-x86_64-2.7/pyodbc.so
odbc:    03.52.0000
driver:  libtdsodbc.so 0.91
         supports ODBC version 03.50
os:      Darwin
unicode: Py_Unicode=2 SQLWCHAR=4
======================================================================
FAIL: test_unicode2 (__main__.FreeTDSTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests2/freetdstests.py", line 1166, in test_unicode2
    self.assertEqual(v, value)
AssertionError: u'' != u"Alors vous imaginez ma surprise, au lever du jour,\n   
                 quand  [truncated]...
+ Alors vous imaginez ma surprise, au lever du jour,
+                     quand une dr\xf4le de petite voix m'a r\xe9veill\xe9. Elle
+                     disait: \xab S'il vous pla\xeet... dessine-moi un mouton! 
\xbb

----------------------------------------------------------------------
Ran 1 test in 0.021s

FAILED (failures=1)

Original comment by zzz...@gmail.com on 6 Apr 2013 at 10:22

GoogleCodeExporter commented 9 years ago
here's one way I *can* make it work:

1. use tds version =8.0 , not 7.0

2. cast the data to non-unicode first (and include a length, for some reason), 
you can get it back as bytes:

cursor.execute("select cast(data as varchar(200)) from uni_round")
result = cursor.fetchone()[0]
assert result.decode('utf-8') == unicodedata, result

Original comment by zzz...@gmail.com on 5 Aug 2013 at 6:35