niner / inline-python-pm

Inline::Python - Write Perl subs and classes in Python.
https://metacpan.org/release/Inline-Python
20 stars 13 forks source link

Non-unicode strings are converted to bytes in Python 3 #27

Closed s-nez closed 4 years ago

s-nez commented 5 years ago

When Inline::Python is built with Python 3, strings are converted to the str class only if they contain unicode characters, otherwise they are converted to bytes, which breaks a lot of Python code that expects str objects.

This failing test case demonstrates the problem (it works with Python2):

use strict;
use warnings;
use utf8;

use Test::More tests => 2;

use Inline Config => DIRECTORY => './blib_test';
use Inline Python => <<'END';
def add_x(string):
    return 'x' + string
END

my $str_utf8  = 'abć';
my $str_ascii = 'abc';

is add_x($str_utf8),  "x$str_utf8",  'string op on unicode string';
is add_x($str_ascii), "x$str_ascii", 'string op on ascii string';

Under Python 3, the second call to add_x produces the following error:

TypeError: can only concatenate str (not "bytes") to str
marekro commented 4 years ago

Same observation here; Python3 is totally unicode, and it seems to be a major inconvenience that Perl strings being passed to Python arrive there as 'bytes' objects. I could make my script work with explicit ut8::upgrade() and/or Encode calls - but that has to be done on each and every string, including hash keys. I tried also "use feature qw(unicode_strings);" such that literal strings ("mystring") would be unicode in Perl itself - but that doesn't work because of the conservative approach that Perl is taking on unicode ("unless the string contains a unicode character, it will be Latin-1 bytes"). So some sort of global flag in Inline::Python, or a compile-time flag that implicitly changes the string interface to UTF8 - such that any string entering the Python realm is "str" (= unicode). I am afraid I cannot code this, but I will happily test and give feedback, if that is of any help.