seckcoder / prettytable

Automatically exported from code.google.com/p/prettytable
Other
0 stars 0 forks source link

Unittest failure with py3.2: prettytable_test.PrintJapanestTest #26

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,
when preparing the Debian package, and building the package in a clean chroot, 
I'm getting this error:

testPrint (prettytable_test.PrintJapanestTest) ... ERROR
testSliceAll (prettytable_test.SlicingTests) ... ok
testSliceFirstTwoRows (prettytable_test.SlicingTests) ... ok
testSliceLastTwoRows (prettytable_test.SlicingTests) ... ok
testReverseSort (prettytable_test.SortingTests) ... ok
testSortBy (prettytable_test.SortingTests) ... ok
testSortKey (prettytable_test.SortingTests) ... ok

======================================================================
ERROR: testPrint (prettytable_test.PrintJapanestTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/buildd/prettytable-0.7/prettytable_test.py", line 566, in testPrint
    print(self.x)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 108-109: 
ordinal not in range(128)

----------------------------------------------------------------------
Ran 49 tests in 0.079s

FAILED (errors=1)

which is weird, given I can run it fine outside the chroot, but I want to ask 
you if you know more.

Cheers,
Sandro

Original issue reported on code.google.com by sandro.tosi on 21 Feb 2013 at 9:39

GoogleCodeExporter commented 9 years ago
Hi Sandro,

Thanks for reporting this, this is strange.  All I can think of to check off 
the top of my head is: what is the environment variable LANG set to inside of 
your chroot environment?

Original comment by luke@maurits.id.au on 22 Feb 2013 at 2:50

GoogleCodeExporter commented 9 years ago
# echo $LANG
C

Original comment by sandro.tosi on 22 Feb 2013 at 6:06

GoogleCodeExporter commented 9 years ago
I think that is the problem.  I just tried running the test suite with LANG=C 
and I get the same error you do.  Setting LANG back to en_US.UTF-8 gets it to 
work like normal.  I guess the reason you can run it fine outside the chroot is 
that in your normal environment you too have LANG set to something with UTF-8 
in it.

Line 566, print(self.x), calls the PrettyTable object's __str__ method.  When 
running under Python 3.x, this returns a unicode string (i.e. the string is not 
encoded).  I don't really understand in detail how the Python print statement 
interacts with your console or terminal emulator to get that unicode string, 
but I suppose at some point it needs to be encoded to get passed from one place 
to another, and the value of LANG influences how that coding is done.  When 
LANG=C, something somewhere attempts to encode the string in ASCII, which fails 
when there is, e.g. Japanese text in the string, as in this case.  If LANG is 
set to something like en_US.UTF-8 (or I presume any locale ending with .UTF-8) 
then the attempted encoding is into UTF-8, which of course works just fine.

Note that in Python 2, __str__ *can* return a unicode string, but the more 
normal behaviour is for it to return an encoded byte string (while 
__unicode__() must return a unicode object).  In PrettyTable, __str__ under 
Python 2 returns an encoded string, and by default it is encoded using UTF-8 
(unless the user overrides this).  So this very same test works just fine under 
Python 2 even if LANG=C.

__str__ is certainly expected to return unicode in Python 3, so it seems to me 
that Python 3 applications are basically fundamentally incompatible with 
LANG=C, unless they are written specifically to overcome this.  In principle I 
suppose it would be possible to check the value of LANG at the time of calling 
__str__ and return an encoded string if LANG=C, but I am not sure if Python 3 
will object to __str__ returning a byte object, and I'm also not sure if that 
would be considered bad practice or not.

At any rate, I don't *think* this is a bug in PrettyTable per se.  The problem 
is that your chroot environment has what is probably by now considered an 
obsolete setting of LANG.  I don't remember exactly how the Debian installer 
goes with locales, but I suspect it probably has a UTF-8 option selected by 
default?  I don't think that this issue is likely to impact very many people 
using the package, because most of them should have a more unicode-friendly 
LANG setting in their day-to-day work environments.

Original comment by luke@maurits.id.au on 22 Feb 2013 at 6:24

GoogleCodeExporter commented 9 years ago
Sandro, were you able to either
(i) get your chroot packaging environment using a non-C locale, or
(ii) verify that the package works correctly anyway when installed in a 
"proper" Debian system as set up by the installer?

Original comment by luke@maurits.id.au on 19 Mar 2013 at 2:24

GoogleCodeExporter commented 9 years ago
Hi Luke,
it was indeed the lack of a UTF-8 charset in the building chroot; once I 
installed & instructed tests to use it, they pass correctly - hence I was just 
able to upload 0.7.2 to Debian!

Original comment by sandro.tosi on 7 Apr 2013 at 9:55

GoogleCodeExporter commented 9 years ago
Hi Sandro,

Wonderful, thanks for letting me know that you got this sorted.  I'll close the 
issue now.

Original comment by luke@maurits.id.au on 7 Apr 2013 at 10:50