Closed thouis closed 12 years ago
Comment in Trac by trac user marco, 2008-01-10
Reference to the OLPC ticket: http://dev.laptop.org/ticket/5559
Comment in Trac by atmention:rkern, 2008-01-10
I cannot reproduce with Python 2.5.1, SVN numpy, and pygtk-2.10.3 on OS X. Nor can I replicate it with SVN numpy and explicitly calling sys.setdefaultencoding('utf-8'). Can you try SVN numpy or even numpy 1.0.4 on your system?
Comment in Trac by atmention:rkern, 2008-01-10
I cannot replicate this on Ubuntu with numpy 1.0.3.1 and python-gtk 2.10.4-ubuntu3.
$ LANG=tr_TR.UTF-8 python -c "import gtk, numpy"
$ LANG=tr_TR.UTF-8 python -c "import sys;reload(sys);sys.setdefaultencoding('utf-8');import numpy"
$
I can replicate it on my XO after importing GTK, but not simply setting the default encoding. I have not yet set up a development environment to compile a new version of numpy, though.
It appears that 'i' characters in the keys are getting capitalized, for some reason. If you are familiar with Turkish, you might be able to tell me why. We do call .lower() on some of these type names from an all-uppercase version of the name. It is possible that a locale-dependent transformation is being installed, and 'I' happens to be a valid lowercase character in Turkish. The wrong keys are 'voId', 'unIcode', 'uIntp', 'uInt', 'strIng', 'signedInteger', 'Intp', 'Integer', 'Int', and 'Inexact'. Interestingly, the bit-width-precise variants of these are fine, e.g. 'void0'; consequently, we can isolate the problem down to the function {{{_add_types()}}}. The smallest test case is as follows:
$ LANG=tr_TR.UTF-8 python -c "import gtk;print 'VOID'.lower()"
voId
$ LANG=tr_TR.UTF-8 python -c "print 'VOID'.lower()"
void
$ LANG=tr_TR.UTF-8 python -c "import sys;reload(sys);sys.setdefaultencoding('utf-8');print 'VOID'.lower()"
void
AFAICT, there is nothing wrong with numpy, and there is little that we can do in numpy to work around this. Something in GTK is messing with the way .lower() works beyond the different locale and the setdefaultencoding() abuse. If there is a locale such that {{{'VOID'.lower() != 'void'}}}, I'm happy to revise this assessment, but it looks like something in GTK is the source of the problem.
Comment in Trac by atmention:rkern, 2008-01-12
Apparently, Turkish as dotted and dotless variants of the Latin letter "I", each with an uppercase and a lowercase version. {{{str.lower()}}} is locale-dependent. We need to replace all occurrences of {{{str.lower()}}} with a locale-independent version. I am going to add functions to numpy.lib.utils called {{{english_lower()}}}, {{{english_upper()}}} and {{{english_capitalize()}}} for use internally.
Comment in Trac by atmention:rkern, 2008-01-12
Hmm. Actually, they need to go somewhere else because they need to be used during the build process.
Comment in Trac by atmention:rkern, 2008-01-12
A better replication of the problem without GTK:
$ LANG=tr_TR.UTF-8 python -c "import locale;locale.setlocale(locale.LC_ALL, '');print repr('VOID'.lower())"
'voId'
The problem may exist in buggy versions of glibc, as this is (apparently) not even the correct result in the Turkish locale. Nonetheless, the results of {{{str.lower()}}} is documented to be locale-dependent, so we should not rely on English rules for internal strings like 'VOID' here.
Comment in Trac by atmention:rkern, 2008-01-12
FWIW, on my OS X Leopard box:
]$ LANG=tr_TR LC_ALL=tr_TR python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'tr_TR'
>>> 'VOID'.lower()
'void'
>>> u'VOID'.lower()
u'void'
On a Fedora Core system (I'm not entirely sure what glibc version):
$ LANG=tr_TR LC_ALL=tr_TR python
Python 2.4.3 (#1, Mar 14 2007, 19:01:42)
[GCC 4.1.1 20070105 (Red Hat 4.1.1-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'tr_TR'
>>> 'VOID'.lower()
'vo\xfdd'
>>> u'VOID'.lower()
u'void'
>>>
Comment in Trac by atmention:rkern, 2008-02-29
I fixed the particular problem found by the OLPC project. numpy will import in the tr_TR.UTF-8 locale. Some text-heavy things like f2py code generation may still not work because of the misinterpretation of English as Turkish.
Original ticket http://projects.scipy.org/numpy/ticket/643 Reported 2008-01-10 by trac user marco, assigned to atmention:rkern.
The following result in a traceback. I suspect it might be related to the fact that importing gtk changes the default encoding to utf-8. I'm using python 2.5.1 and numpy 1.0.3.1
LANG=tr_TR.UTF-8 python -c "import gtk, numpy"