Open GoogleCodeExporter opened 8 years ago
Sorry, that .encode() wasn't intended, though the result is the same. Take out
the encode(), same result:
# coding: utf-8
import pyodbc
print pyodbc.version
unicodedata = u"Alors vous imaginez ma surprise, au lever du jour, "\
u"quand une drôle de petite voix m’a réveillé. Elle "\
u"disait: « S’il vous plaît… dessine-moi un mouton! »"
conn = pyodbc.connect(dsn="ms_2005", user="scott", password="tiger")
cursor = conn.cursor()
cursor.execute("""
create table uni_round (
data nvarchar(500)
)
""")
cursor.execute("""
insert into uni_round (data) values (?)
""", (unicodedata,))
cursor.execute("select data from uni_round")
result = cursor.fetchone()[0]
assert result == unicodedata, result
Original comment by zzz...@gmail.com
on 14 Mar 2012 at 6:08
The freetdstests.py unit tests pass using the following:
* OS/X 10.8 (Mountain Lion)
* SQL Server 2012 Express on Windows 7
* Default Apple Python
* FreeTDS 0.91, compiled from source
* pyodbc 3.0.7-beta08
I don't believe there are any changes since 3.0.6 that would have fixed
anything related.
I also added the following test and it passed:
def test_unicode2(self):
"""
From Google Code Issue 247. (Replaced the smart quotes and elipsis)
"""
value = u"""Alors vous imaginez ma surprise, au lever du jour,
quand une drôle de petite voix m'a réveillé. Elle
disait: « S'il vous plaît... dessine-moi un mouton! »"""
self.cursor.execute("create table t1(s nvarchar(500))")
self.cursor.execute("insert into t1 values(?)", value)
v = self.cursor.execute("select * from t1").fetchone()[0]
self.assertEqual(type(v), unicode)
self.assertEqual(v, value)
Are you still having problems?
Original comment by mkleehammer
on 27 Sep 2012 at 10:14
Original comment by mkleehammer
on 29 Sep 2012 at 4:59
thanks. I'll have to get the time to install 0.91 again and get everything
going, but if you are not seeing the issue on your end, that's encouraging.
is your test using "nvarchar" as the type for the column ?
Original comment by zzz...@gmail.com
on 29 Sep 2012 at 5:10
still having issues, I get back a string, but the encoding is wrong:
- Python 2.7.3 built from source, as well as Python 3.3.0 built from source
- OSX mountain lion
- FreeTDS 0.91
- Pyodbc 3.0.7-beta10
- Freetds.conf has:
[ms_2005]
host = 172.16.248.128
port = 1213
tds version = 8.0
client charset = UTF8
text size = 50000000
Looking at PDB this is what I'm currently seeing for 2.7 (the assertion doesn't
print anything for some reason):
(Pdb) !result
u'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xc3\xb4le de
petite voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait: \xc2\xab
S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton! \xc2\xbb'
(Pdb) !unicodedata
u'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xf4le de
petite voix m\u2019a r\xe9veill\xe9. Elle disait: \xab S\u2019il vous
pla\xeet\u2026 dessine-moi un mouton! \xbb'
I get a similar result for 3.3 (the assertion error prints):
AssertionError: Alors vous imaginez ma surprise, au lever du jour, quand une
dr\xc3\xb4le de petite voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait:
\xc2\xab S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton!
\xc2\xbb
!=
Alors vous imaginez ma surprise, au lever du jour, quand une dr\xf4le de petite
voix m\u2019a r\xe9veill\xe9. Elle disait: \xab S\u2019il vous pla\xeet\u2026
dessine-moi un mouton! \xbb
Original comment by zzz...@gmail.com
on 2 Apr 2013 at 10:29
yeah I'm trying every flag there is, here's some other detail:
- the Python builds are 64 bit
- I'm using iODBC, not unixodbc, version 3.52.7
the value coming back from FreeTDS is clearly already utf-8 encoded. If I try
to force "UCS2" or "UCS4" in the freetds.conf file, the whole program just
crashes:
Assertion failed: (0), function tds7_send_login, file login.c, line 905.
Abort trap: 6
if you leave client encoding out, then freetds defaults to iso-8859-1, and as
expected I get an encoded iso-8859-1 string inside the u'' instead of a utf-8.
Original comment by zzz...@gmail.com
on 5 Apr 2013 at 4:25
just tried the built-in Apple Python, getting the same result.
Original comment by zzz...@gmail.com
on 5 Apr 2013 at 4:28
OK researching my iodbc setup, I think I have 3.52.6 and 3.52.7 both installed,
will try to reconcile which is in use.
Original comment by zzz...@gmail.com
on 5 Apr 2013 at 4:35
3.52.6
Original comment by zzz...@gmail.com
on 5 Apr 2013 at 4:48
I'm just beginning to understand the source here, and I believe you've
mentioned earlier, pyodbc assumes that data being returned is in UCS-2 format.
And interestingly, when I run this script on a Fedora platform with unixodbc
and freetds 0.91, I get the correct result. Looking in the source, I don't see
pyodbc doing anything at all with encodings - it is moving the data straight
from what SQLGetData() gives it into a Python Unicode object, though I don't
yet understand the buffering logic going on.
The strange thing here is that, per FreeTDS's documentation here:
http://freetds.schemamania.org/userguide/localization.htm, this shouldn't work
at all - you will always be getting the data either as UTF-8, or ISO-88590-1
(the default), unless you set UCS-2 in freetds.conf. Which does not work
either on OSX or on Linux, you get a core dump.
Admitting that I'm still totally in the dark here, it seems like FreeTDS +
UnixODBC on linux is not actually honoring "client encoding" whereas FreeTDS +
iODBC on OSX is, hence on OSX I get UTF-8 shoved into a u'' string.
Original comment by zzz...@gmail.com
on 5 Apr 2013 at 10:33
also supporting this, if I use an inadequate encoding, like WINDOWS-1251, on
OSX I get: u'dr?le m\x92a r?veill?', on Linux I still get the full string -
"client charset" is somehow having no effect on linux (unless I change it to a
"broken" encoding, like UCS-2 or UTF-16 - then it core dumps).
Original comment by zzz...@gmail.com
on 6 Apr 2013 at 7:43
OK I've now tested this Pyodbc against the following test:
# coding: utf-8
import imp
pyodbc = imp.load_dynamic("pyodbc",
"build/lib.macosx-10.4-x86_64-2.7/pyodbc.so")
unicodedata = u"drôle m’a réveillé."
conn = pyodbc.connect(u"DSN=ms_2005;UID=scott;PWD=tiger")
cursor = conn.cursor()
cursor.execute("select ?", (unicodedata, ))
result = cursor.fetchone()[0]
print "original data: %r" % unicodedata
print "received from pyodbc: %r" % result
All on OSX, FreeTDS 0.91:
Result on iODBC 3.52.6:
classics-MacBook-Pro:pyodbc classic$ python test.py
original data: u'dr\xf4le m\u2019a r\xe9veill\xe9.'
received from pyodbc: u'dr\xc3\xb4le m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9.'
Result on iODBC 3.52.7, 3.52.8 on master (these are via various tags at
https://github.com/openlink/iODBC/tree/develop/iodbc), as well as unixODBC
2.3.1 (for each build, I tested pyodbc.so with otool -L to ensure it built to
the correct library):
original data: u'dr\xf4le m\u2019a r\xe9veill\xe9.'
received from pyodbc: u''
What's going on in all those others is that the driver isn't handling the u''
string at all, if I change it to u'hi' I get this:
classics-MacBook-Pro:pyodbc classic$ python test.py
original data: u'hi'
received from pyodbc: u'\ufffd\x00'
What freetds.log shows in all the non-working cases that isn't in the 3.52.6
log is this, right before it attempts to send the statement along with the
bound parameter:
17:54:26.627963 34615 (util.c:331):tdserror(0x1003a3480, 0x1003c37f0, 2402, 0)
17:54:26.627968 34615 (odbc.c:2270):msgno 2402 20003
17:54:26.627973 34615 (util.c:361):tdserror: client library returned
TDS_INT_CANCEL(2)
17:54:26.627978 34615 (util.c:384):tdserror: returning TDS_INT_CANCEL(2)
This test seems to illustrate an issue at least with sending the string, and
possibly receiving it as well.
Original comment by zzz...@gmail.com
on 6 Apr 2013 at 10:14
[deleted comment]
Running the tests2/freetdstests.py causes a core dump for me if I keep the
encoding on UTF-8 in freetds.conf, one of the tests is doing something it
doesn't like. For the test_unicode2 you have above, it fails:
classics-MacBook-Pro:pyodbc classic$ python tests2/freetdstests.py
"DSN=ms_2005;UID=scott;PWD=tiger" -t test_unicode2
python: 2.7.3 (default, Feb 14 2013, 14:25:59)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)]
pyodbc: 3.0.7-beta10
/usr/local/src/pyodbc/build/lib.macosx-10.4-x86_64-2.7/pyodbc.so
odbc: 03.52.0000
driver: libtdsodbc.so 0.91
supports ODBC version 03.50
os: Darwin
unicode: Py_Unicode=2 SQLWCHAR=4
======================================================================
FAIL: test_unicode2 (__main__.FreeTDSTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests2/freetdstests.py", line 1166, in test_unicode2
self.assertEqual(v, value)
AssertionError: u'' != u"Alors vous imaginez ma surprise, au lever du jour,\n
quand [truncated]...
+ Alors vous imaginez ma surprise, au lever du jour,
+ quand une dr\xf4le de petite voix m'a r\xe9veill\xe9. Elle
+ disait: \xab S'il vous pla\xeet... dessine-moi un mouton!
\xbb
----------------------------------------------------------------------
Ran 1 test in 0.021s
FAILED (failures=1)
Original comment by zzz...@gmail.com
on 6 Apr 2013 at 10:22
here's one way I *can* make it work:
1. use tds version =8.0 , not 7.0
2. cast the data to non-unicode first (and include a length, for some reason),
you can get it back as bytes:
cursor.execute("select cast(data as varchar(200)) from uni_round")
result = cursor.fetchone()[0]
assert result.decode('utf-8') == unicodedata, result
Original comment by zzz...@gmail.com
on 5 Aug 2013 at 6:35
Original issue reported on code.google.com by
zzz...@gmail.com
on 14 Mar 2012 at 6:02