zillow / ctds

Python DB-API 2.0 library for MS SQL Server
MIT License
83 stars 12 forks source link

there is not typewrapper for NVARCHAR #2

Closed sunghoonyang closed 7 years ago

sunghoonyang commented 7 years ago

I have looked through ctds documentation and was not able to find the best practice for NVARCHAR column. My situation is that one column has string that has both English and Korean, and I was able to confirm that in python env it does process it as unicode korean chars by printing the column value per row. but problem is that when i bulk_insert, it comes out interpreted as japanese unicode.

prior to the bulk_insert i change the env language to Korean ("SET LANGUAGE KOREAN") and also table DML has the column collated "Korean_Wansung_CI_AS" but so far unsuccessful. I'm also curious the reason why even if i cannot specify N decorator - like N'nvarchar', it does get interpreted to japanese unicode, not simply come out as question marks like this ????englishword????

Any guidelines?

joshuahlang commented 7 years ago

I'm currently working on including an explicit NVarChar type wrapper, and additionally converting the default Python unicode -> SQL translation to be to NVARCHAR (at least when running against FreeTDS >= 0.95)

However, this won't address the issue you're seeing using Connection.bulk_insert. That code uses different APIs in FreeTDS, and those APIs don't properly support NVARCHAR characters unfortunately. In other words, this is a bug in FreeTDS.

sunghoonyang commented 7 years ago

hi joshuahlang

We resolved this issue with Connection.bulk_insert by encoding the string to UTF-16 and using SqlVarchar typewrapper. Korean characters come out immaculately.

joshuahlang commented 7 years ago

I've updated the documentation to include the above work-around.

http://pythonhosted.org/ctds/bulk_insert.html

IliaLukin commented 6 years ago

UTF-16 is wrong encoding "a code point with the value U+FEFF" You need utf-16le or utf-16be

joshuahlang commented 6 years ago

What is this comment in reference to?

IliaLukin commented 6 years ago

If user needs to use Connection.bulk_insert then him must use SqlVarchar typewrapper + utf-16le or utf-16be. Not utf-16