pentaho-labs / pentaho-cpython-plugin

This is a PDI plugin that allows execution of Python code.
Apache License 2.0
32 stars 19 forks source link

step hangs when python script contains any japanese text #13

Open benghuduga12 opened 6 years ago

benghuduga12 commented 6 years ago

Hi,

I am facing issue in running the CPython step in transformation when the python script contains any japanese text.

Here i am reading the excel files which the file name and the sheet name is in japanese text. I am using openpyxl library for processing the excel(read and write). The same script when i run from python environment directly it runs properly. I face problem only when i run it from the CPython Script Executor.

Any help is appreciated.

m-a-hall commented 6 years ago

I will be making a new release shortly that, hopefully, will address this issue. I've seen this issue when decoding UTF-8 bytes in python that have been encoded to UTF-8 by Java (8 at least) when the source text contains characters outside of ascii range. To be honest, I have no idea why this is happening. The same bytes can be decoded fine when done so directly in python - just the encoding in Java seems to be problematic. My fix involves base64 encoding (when non ascii characters are detected). When that is decoded in python, and then the resulting UTF-8 decoded, it seems to work fine. There is an cost in overhead though.

Cheers, Mark.