pentaho-labs / pentaho-cpython-plugin

This is a PDI plugin that allows execution of Python code.
Apache License 2.0
32 stars 19 forks source link

Microseconds on date fields in output dataframe interpreted incorrectly #35

Open ppojawa opened 2 years ago

ppojawa commented 2 years ago

With "Include Input Fields as Output Fields" option unchecked, "Python Variables to Get" containing the name of a dataframe object with output data, when that output dataframe contains a column of dates including fractional parts of a second, the method pyServer.send_rows() formats the dates with the pattern '%Y-%m-%d %H:%M:%S.%f', where the last part, '%f', produces six digits containing the number of microseconds.

However, when the same data is received by ServerUtils.csvToRows() method, the dates are parsed according to SimpleDateFormat( "yyyy-MM-dd HH:mm:ss.SSS"), where the last part, "SSS" is interpreted as the number of milliseconds, even if the number is longer than three digits.

This results in unintended changes of date values.

Example: 2022-05-16 12:00:00.123 in the output dataframe produces 2022-05-16 12:00:00.123000 in the csv sent by pyServer.py which gets read by ServerUtils.csvToRows() and interpreted as 2022-05-16 12:00:00 + 123000 milliseconds, which is equal to 2022-05-16 12:00:00 + 123 seconds, i.e. 2022-05-16 12:02:03

My environment is: Windows 10 Java: AdoptOpenJDK 1.8.0_202-b08 Pentaho v. 9.2 CPython Script Executor v. 1.5 installed from Marketplace