pentaho-labs / pentaho-cpython-plugin

This is a PDI plugin that allows execution of Python code.
Apache License 2.0
32 stars 19 forks source link

Error: Failed to read the message size from the input stream #25

Open PovilasKud opened 6 years ago

PovilasKud commented 6 years ago

PROBLEM: I have ran a bunch (4301 qnt) of requests to REST API endpoint getting back JSON response and it crashes with error Failed to read the message size from the input stream

Full traceback:

2018-09-12 12:24:40.337 INFO <Thread-404> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Dispatching started for transformation [P_get_data] 2018-09-12 12:24:40.355 INFO <P_get_data - Get rows from result> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Finished processing (I=0, O=0, R=4301, W=4301, U=0, E=0) 2018-09-12 13:01:09.892 ERROR <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Unexpected error 2018-09-12 13:01:09.893 ERROR <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] org.pentaho.di.core.exception.KettleException: java.io.IOException: Failed to read the message size from the input stream! Failed to read the message size from the input stream!

`at org.pentaho.python.ServerUtils.receiveRowsFromPandasDataFrame(ServerUtils.java:591)
at org.pentaho.python.PythonSession.rowsFromPythonDataFrame(PythonSession.java:462)
at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutorData.constructOutputRowsFromFrame(CPythonScriptExecutorData.java:238)
at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.executeScriptAndProcessResult(CPythonScriptExecutor.java:367)
at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.processBatch(CPythonScriptExecutor.java:284)
at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.processRow(CPythonScriptExecutor.java:243)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
at java.lang.Thread.run(Thread.java:748)

Caused by: java.io.IOException: Failed to read the message size from the input stream! at org.pentaho.python.ServerUtils.readDelimitedFromInputStream(ServerUtils.java:921) at org.pentaho.python.ServerUtils.receiveRowsFromPandasDataFrame(ServerUtils.java:587) ... 7 more`

2018-09-12 13:01:09.896 ERROR <Thread-404> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Errors detected! 2018-09-12 13:01:09.897 INFO <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Finished processing (I=0, O=0, R=4301, W=0, U=0, E=1) 2018-09-12 13:01:09.898 WARN <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Transformation detected one or more steps with errors. 2018-09-12 13:01:09.899 WARN <P_get_data - Get Profile Data> [/root/.kettle/data/main_py_job.kjb file:///root/.kettle/data/P_get_data.ktr] Transformation is killing the other steps!

I have checked serve resources(CPU, RAM) and it doesn't look like it's related to the error.

What might be the problem ?

m-a-hall commented 5 years ago

This often happens when the python script fails to execute, or there is some sort of catastrophic failure with respect to the python micro-service (the latter more often results in broken socket errors though). Can you run your scripts successfully outside of PDI?

usbrandon commented 5 years ago

I have this problem too. My script does execute outside of PDI.

laercioleo commented 5 years ago

Ocorre o mesmo erro pra mim. O script Python funciona fora do Pentaho.

Outra situação é um script Python que funciona no Pentaho. Se eu alterar e der erro na execução, mesmo alterando pra situação anterior continua dando o mesmo erro. Mas se eu fechar e abrir o Pentaho para de dar erro.

m-a-hall commented 5 years ago

If you are trying to retrieve the contents of a variable in python (to pass on downstream in PDI) then the variable must be json serializable. If it is not, then this can cause a communications failure with the micro server. E.g. numpy arrays are not json serializable, so need to be converted to a list before they can be retrieved from python into the step.

oddworldng commented 4 years ago

I had the same problem and the solution was normalizing all Python strings with accent marks using "unicodedata" library.

https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize

qqwerty221 commented 4 years ago

Input manual python script which chinese would trigger this Error, too. file_obj = getObjectByPath('/test_folder') #working file_obj = getObjectByPath('/脚本') #Raise exception error

cdm-tao commented 2 months ago

This error also occurs if a comment in the Python code includes accented characters. For example: # Llamar a DataFrame.reset_index para añadir las columnas del índice

will cause the "Failed to read message size..." error. Pentaho PDI needs to be relaunched to prevent susbsequent calls to rasks containing cpyhton nodes provoking the error "Software caused connection abort: socket write error" - even though the original code has been modified to no include accents.