yhat / db.py

db.py is an easier way to interact with your databases
BSD 2-Clause "Simplified" License
1.22k stars 111 forks source link

utf-8 values are double-coded (at least from postgresql) #55

Open hantusk opened 9 years ago

hantusk commented 9 years ago

In a dataframe resulting from e.g. db.tables.table.all(), utf-8 values from postgresql were double-encoded (encoded as utf-8 twice).

When i later had to save my dataframe to an Excelsheet or a .csv-file, i had to do a .decode('utf-8') on all values in the dataframe, for it to be able to export after some troubleshooting.

fnielsen commented 9 years ago

I have another UTF-8 problem with MySQL and Python2.7 and db.py 0.4.0

As far as I can tell my database that I connect to is UTF-8:

mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA S
    -> WHERE schema_name = "schema_name";
+----------------------------+
| default_character_set_name |
+----------------------------+
| utf8                       |
+----------------------------+

df = database.tables.table.all() get me data as str rather than Unicode. I will then do a unicode(cell, 'iso8859'), which so far seems to work. cell.decode('utf-8') as @hantusk does not work for me.

Update: I suppose that cell.decode('unicode_escape') is - in my case - better than unicode(cell, 'iso8859').