luispedro / mahotas

Computer Vision in Python
https://mahotas.rtfd.io
Other
844 stars 148 forks source link

Use cp1252 instead of utf8 for encoding of file names on Python 3 #13

Closed cgohlke closed 12 years ago

cgohlke commented 12 years ago

This will allow for some non ASCII7 characters in file names

cgohlke commented 12 years ago

The problem is that FreeImage does not understand utf-8 encoded strings containing non ASCII7 characters. Freeimage.py passes c_char_p to the functions, which are not aware of utf8 encoding AFAICT. For example, loading a file named teßt.tif will currently fail on Python 3 (verified on Windows), while it will load correctly when using cp1252 instead of utf8 encoding. Unless the freeimage.py wrapper is made fully unicode aware, accepting unicode or utf8 encoded file names, I think using cp1252 or latin1 is a better choice than utf8.

luispedro commented 12 years ago

OTOH, Unix does not understand anything but 8-Byte strings and the convention (modern convention; it used to be a mess) is that you use UTF-8.

Does the following work on Windows:

import locale
_,encoding = locale.getdefaultlocale()
if encoding is None:
   encoding = 'UTF-8'

On my system (Ubuntu), this is UTF-8. I added the check in case the old-style "C" locale is being used.

cgohlke commented 12 years ago

Thanks. locale.getdefaultlocale works on my system. That's better than hardcoding the encoding. I'll change the patch. Probably the best way to make freeimage.py handle unicode file names would be to use the unicode aware functions of FreeImage, e.g. FreeImage_LoadU.