mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.89k stars 562 forks source link

pyodbc execute not supporting insert special symbols like trademark(™) symbol and other unicode( ’ ) , others #1091

Closed shravankumargoud closed 1 year ago

shravankumargoud commented 1 year ago

Please first make sure you have looked at:

Environment

To diagnose, we usually need to know the following, including version numbers. On Windows, be sure to specify 32-bit Python or 64-bit:

Issue

When I try to insert a value "B™¥R™ Z" using pyodbc execute, it stores in db as "BÂ¥R Z"
The same insert works good with plain insert command in dB.

Seeing the forum, here i tried, multiple encoding and decoding methods, but none worked for me. Is there a way to skip the decoding or encoding performed by connection to execute the query.

Thanks.

v-makouz commented 1 year ago

What's the ODBC driver being used?

shravankumargoud commented 1 year ago

IBM i Access ODBC Driver

v-chojas commented 1 year ago

I think somewhere in the process there is a confusion between CP1252 and ISO-8859-1 encoding. The 'TM' character is present in the former (at location 99h) but not the latter.

shravankumargoud commented 1 year ago

@v-chojas : The issue is not only with 'TM' other chars like in HMF Women’s Daily are displayed like HMF Womens Daily

v-chojas commented 1 year ago

That is consistent with the confusion between CP1252 and ISO-8859-1, as that character is not present in the latter either. What encoding settings have you tried?

shravankumargoud commented 1 year ago

cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8') cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8') cnxn.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-32le') cnxn.setencoding(encoding='utf-8')

Also tried similarly with iso-8859-1

v-chojas commented 1 year ago

I'm not familiar with your driver but you should look at its documentation for what encodings it expects to use at the ODBC interface. Try CP1252/Windows-1252 instead.

shravankumargoud commented 1 year ago

from the driver documentation, https://www.ibm.com/docs/en/db2/11.5?topic=managers-datadirect-odbc-driver-manager, it uses UTF-8, I tried with CP1252 and Windows-1252, but no luck.

v-chojas commented 1 year ago

pyODBC on Windows wouldn't be using DataDirect's driver manager, but the Windows one. Set the SQL_WCHAR encoding to UCS-2 or UTF-16 and try again.

shravankumargoud commented 1 year ago

I tried with options: setdecoding(po.SQL_WCHAR, encoding='UCS-2') ValueError: not a registered codec: 'UCS-2'

and with UTF-16, no luck, same boxes

shravankumargoud commented 1 year ago

@v-chojas , @v-makouz , @mkleehammer : Any recommendations here, thanks

gordthompson commented 1 year ago

Can you locate an existing row in the table that contains either '™' or '’' and displays correctly in a DB2-specific tool (akin to psql or PGAdmin4 for Postgresql)? If so, what do you get when you try to retrieve that row with pyodbc's default encoding settings?

keitherskine commented 1 year ago

Just a thought, but if this is an IBM database, perhaps the database is expecting EBCDIC-encoded values.

Also, as a workaround, perhaps you can provide the TM symbol as an EBCDIC_CHAR function, e.g. "BÂ" + EBCDIC_CHAR(175) + "¥RÂ" + EBCDIC_CHAR(175) + " Z" (I may have the wrong syntax there).

gordthompson commented 1 year ago

(Ha! I had completely forgotten about EBCDIC. :smile:)

shravankumargoud commented 1 year ago

Thanks a lot for all the recommendations, Finally I got through this by passing the values in a tuple like below: cursor.execute(insert_query, insert_query_values)

insert_query = insert into table column1, column2, colum3 values (?, ?, ?) insert_query_values = (id, name, description)