numpy / numpy

The fundamental package for scientific computing with Python.
https://numpy.org
Other
28.17k stars 10.16k forks source link

Tracebacks in Turkish locales (Trac #643) #1241

Closed numpy-gitbot closed 12 years ago

numpy-gitbot commented 12 years ago

Original ticket http://projects.scipy.org/numpy/ticket/643 on 2008-01-10 by trac user marco, assigned to @rkern.

The following result in a traceback. I suspect it might be related to the fact that importing gtk changes the default encoding to utf-8. I'm using python 2.5.1 and numpy 1.0.3.1

LANG=tr_TR.UTF-8 python -c "import gtk, numpy"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python2.5/site-packages/numpy/__init__.py", line 39, in <module>
    import core
  File "/usr/lib64/python2.5/site-packages/numpy/core/__init__.py", line 8, in <module>
    import numerictypes as nt
  File "/usr/lib64/python2.5/site-packages/numpy/core/numerictypes.py", line 241, in <module>
    void = allTypes['void']
KeyError: 'void'
numpy-gitbot commented 12 years ago

trac user marco wrote on 2008-01-10

Reference to the OLPC ticket: http://dev.laptop.org/ticket/5559

numpy-gitbot commented 12 years ago

@rkern wrote on 2008-01-10

I cannot reproduce with Python 2.5.1, SVN numpy, and pygtk-2.10.3 on OS X. Nor can I replicate it with SVN numpy and explicitly calling sys.setdefaultencoding('utf-8'). Can you try SVN numpy or even numpy 1.0.4 on your system?

numpy-gitbot commented 12 years ago

@rkern wrote on 2008-01-10

I cannot replicate this on Ubuntu with numpy 1.0.3.1 and python-gtk 2.10.4-ubuntu3.

$ LANG=tr_TR.UTF-8 python -c "import gtk, numpy"                                      
$ LANG=tr_TR.UTF-8 python -c "import sys;reload(sys);sys.setdefaultencoding('utf-8');import numpy"
$ 

I can replicate it on my XO after importing GTK, but not simply setting the default encoding. I have not yet set up a development environment to compile a new version of numpy, though.

It appears that 'i' characters in the keys are getting capitalized, for some reason. If you are familiar with Turkish, you might be able to tell me why. We do call .lower() on some of these type names from an all-uppercase version of the name. It is possible that a locale-dependent transformation is being installed, and 'I' happens to be a valid lowercase character in Turkish. The wrong keys are 'voId', 'unIcode', 'uIntp', 'uInt', 'strIng', 'signedInteger', 'Intp', 'Integer', 'Int', and 'Inexact'. Interestingly, the bit-width-precise variants of these are fine, e.g. 'void0'; consequently, we can isolate the problem down to the function _add_types(). The smallest test case is as follows:

$ LANG=tr_TR.UTF-8 python -c "import gtk;print 'VOID'.lower()"
voId
$ LANG=tr_TR.UTF-8 python -c "print 'VOID'.lower()"
void
$ LANG=tr_TR.UTF-8 python -c "import sys;reload(sys);sys.setdefaultencoding('utf-8');print 'VOID'.lower()"
void

AFAICT, there is nothing wrong with numpy, and there is little that we can do in numpy to work around this. Something in GTK is messing with the way .lower() works beyond the different locale and the setdefaultencoding() abuse. If there is a locale such that 'VOID'.lower() != 'void', I'm happy to revise this assessment, but it looks like something in GTK is the source of the problem.

numpy-gitbot commented 12 years ago

@rkern wrote on 2008-01-12

Apparently, Turkish as dotted and dotless variants of the Latin letter "I", each with an uppercase and a lowercase version. str.lower() is locale-dependent. We need to replace all occurrences of str.lower() with a locale-independent version. I am going to add functions to numpy.lib.utils called english_lower(), english_upper() and english_capitalize() for use internally.

numpy-gitbot commented 12 years ago

@rkern wrote on 2008-01-12

Hmm. Actually, they need to go somewhere else because they need to be used during the build process.

numpy-gitbot commented 12 years ago

@rkern wrote on 2008-01-12

A better replication of the problem without GTK:

$ LANG=tr_TR.UTF-8 python -c "import locale;locale.setlocale(locale.LC_ALL, '');print repr('VOID'.lower())"
'voId'

The problem may exist in buggy versions of glibc, as this is (apparently) not even the correct result in the Turkish locale. Nonetheless, the results of str.lower() is documented to be locale-dependent, so we should not rely on English rules for internal strings like 'VOID' here.

numpy-gitbot commented 12 years ago

Title changed from Traceback when imported after gtk to Tracebacks in Turkish locales by @rkern on 2008-01-12

numpy-gitbot commented 12 years ago

@rkern wrote on 2008-01-12

FWIW, on my OS X Leopard box:

]$ LANG=tr_TR LC_ALL=tr_TR python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'tr_TR'
>>> 'VOID'.lower()
'void'
>>> u'VOID'.lower()
u'void'

On a Fedora Core system (I'm not entirely sure what glibc version):

$ LANG=tr_TR LC_ALL=tr_TR python
Python 2.4.3 (#1, Mar 14 2007, 19:01:42) 
[GCC 4.1.1 20070105 (Red Hat 4.1.1-52)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'tr_TR'
>>> 'VOID'.lower()
'vo\xfdd'
>>> u'VOID'.lower()
u'void'
>>> 
numpy-gitbot commented 12 years ago

@rkern wrote on 2008-02-29

I fixed the particular problem found by the OLPC project. numpy will import in the tr_TR.UTF-8 locale. Some text-heavy things like f2py code generation may still not work because of the misinterpretation of English as Turkish.