Closed numpy-gitbot closed 12 years ago
trac user marco wrote on 2008-01-10
Reference to the OLPC ticket: http://dev.laptop.org/ticket/5559
@rkern wrote on 2008-01-10
I cannot reproduce with Python 2.5.1, SVN numpy, and pygtk-2.10.3 on OS X. Nor can I replicate it with SVN numpy and explicitly calling sys.setdefaultencoding('utf-8'). Can you try SVN numpy or even numpy 1.0.4 on your system?
@rkern wrote on 2008-01-10
I cannot replicate this on Ubuntu with numpy 1.0.3.1 and python-gtk 2.10.4-ubuntu3.
$ LANG=tr_TR.UTF-8 python -c "import gtk, numpy"
$ LANG=tr_TR.UTF-8 python -c "import sys;reload(sys);sys.setdefaultencoding('utf-8');import numpy"
$
I can replicate it on my XO after importing GTK, but not simply setting the default encoding. I have not yet set up a development environment to compile a new version of numpy, though.
It appears that 'i' characters in the keys are getting capitalized, for some reason. If you are familiar with Turkish, you might be able to tell me why. We do call .lower() on some of these type names from an all-uppercase version of the name. It is possible that a locale-dependent transformation is being installed, and 'I' happens to be a valid lowercase character in Turkish. The wrong keys are 'voId', 'unIcode', 'uIntp', 'uInt', 'strIng', 'signedInteger', 'Intp', 'Integer', 'Int', and 'Inexact'. Interestingly, the bit-width-precise variants of these are fine, e.g. 'void0'; consequently, we can isolate the problem down to the function _add_types()
. The smallest test case is as follows:
$ LANG=tr_TR.UTF-8 python -c "import gtk;print 'VOID'.lower()"
voId
$ LANG=tr_TR.UTF-8 python -c "print 'VOID'.lower()"
void
$ LANG=tr_TR.UTF-8 python -c "import sys;reload(sys);sys.setdefaultencoding('utf-8');print 'VOID'.lower()"
void
AFAICT, there is nothing wrong with numpy, and there is little that we can do in numpy to work around this. Something in GTK is messing with the way .lower() works beyond the different locale and the setdefaultencoding() abuse. If there is a locale such that 'VOID'.lower() != 'void'
, I'm happy to revise this assessment, but it looks like something in GTK is the source of the problem.
@rkern wrote on 2008-01-12
Apparently, Turkish as dotted and dotless variants of the Latin letter "I", each with an uppercase and a lowercase version. str.lower()
is locale-dependent. We need to replace all occurrences of str.lower()
with a locale-independent version. I am going to add functions to numpy.lib.utils called english_lower()
, english_upper()
and english_capitalize()
for use internally.
@rkern wrote on 2008-01-12
Hmm. Actually, they need to go somewhere else because they need to be used during the build process.
@rkern wrote on 2008-01-12
A better replication of the problem without GTK:
$ LANG=tr_TR.UTF-8 python -c "import locale;locale.setlocale(locale.LC_ALL, '');print repr('VOID'.lower())"
'voId'
The problem may exist in buggy versions of glibc, as this is (apparently) not even the correct result in the Turkish locale. Nonetheless, the results of str.lower()
is documented to be locale-dependent, so we should not rely on English rules for internal strings like 'VOID' here.
Title changed from Traceback when imported after gtk
to Tracebacks in Turkish locales
by @rkern on 2008-01-12
@rkern wrote on 2008-01-12
FWIW, on my OS X Leopard box:
]$ LANG=tr_TR LC_ALL=tr_TR python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'tr_TR'
>>> 'VOID'.lower()
'void'
>>> u'VOID'.lower()
u'void'
On a Fedora Core system (I'm not entirely sure what glibc version):
$ LANG=tr_TR LC_ALL=tr_TR python
Python 2.4.3 (#1, Mar 14 2007, 19:01:42)
[GCC 4.1.1 20070105 (Red Hat 4.1.1-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'tr_TR'
>>> 'VOID'.lower()
'vo\xfdd'
>>> u'VOID'.lower()
u'void'
>>>
@rkern wrote on 2008-02-29
I fixed the particular problem found by the OLPC project. numpy will import in the tr_TR.UTF-8 locale. Some text-heavy things like f2py code generation may still not work because of the misinterpretation of English as Turkish.
Original ticket http://projects.scipy.org/numpy/ticket/643 on 2008-01-10 by trac user marco, assigned to @rkern.
The following result in a traceback. I suspect it might be related to the fact that importing gtk changes the default encoding to utf-8. I'm using python 2.5.1 and numpy 1.0.3.1
LANG=tr_TR.UTF-8 python -c "import gtk, numpy"