yehoshuadimarsky / bcpandas

High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. No knowledge of BCP required!!
MIT License
125 stars 43 forks source link

pandasbcp wrong encoding when saving spanish characters #170

Open anlagbr opened 9 months ago

anlagbr commented 9 months ago

bcpandas saves pd.DataFrame with default encoding utf-8 and when it's uploaded through bcp some Spanish characters are not displayed correctly in the database. (They are correctly displayed in my pd.DataFrame

Right now, I have tried specifying -C 65001 to the bcp command by modifying the source bcpandas files. It has not worked. I will post a solution if I find one.

Best.

vlasvlasvlas commented 9 months ago

i also had to change the default encoding previously when using bcpandas, would be great if you can pass it as a param

anlagbr commented 9 months ago

image

bcpandas uses format file created just in time. Thus, the flag -C 65001 won't work because the format file takes precedence.

However, you can specify the collation as a param bcpandas.to_sql and this solves the problem. I specified collation="Modern_Spanish_100_CI_AS_SC_UTF8" and it solved my encoding problem.

import bcpandas

bcpandas.to_sql(
            df,
            table.name,
            creds,
            collation="Modern_Spanish_100_CI_AS_SC_UTF8",
            encoding="utf-8"
 )