pentaho-labs / pentaho-cpython-plugin

This is a PDI plugin that allows execution of Python code.
Apache License 2.0
32 stars 19 forks source link

Data transmission not working correctly on locales with comma as decimal seperator #29

Open eott-siz opened 5 years ago

eott-siz commented 5 years ago

When a field of type number inside a Kettle transformation is transmitted to a python script via a pandas DataFrame object defined in the step as input frame, the resulting DataFrame object will contain one more column than the input row if the script is running on a machine with a locale that uses the "," character as decimal seperator.

Presumably this happens because the data of the Kettle transformation is send via a temporary CSV file, then parsed in python into a DataFrame. However since number fields are not quoted and the number is formatted according to the locale this causes an unqoted comma to appear in the row, in turn causing pandas to parse it as two seperate columns.

Note that steps to reproduce are not provided due to the difficulty of mocking locales and because a solution is not expected. This issue exists in order to inform other developers of a known bug to shorten their search should they run into the same problem.