knime / knimepy

Other
44 stars 15 forks source link

Throw exception when remote workflow execution fails #31

Open AlexanderFillbrunn opened 3 years ago

AlexanderFillbrunn commented 3 years ago

Currently, when I call a workflow on KNIME Server and the workflow fails for whatever reason, the output table is a table with some default columns and the execute method just returns without any indication that something went wrong.

column-string  column-int  column-double  column-long  column-boolean  \
0        value1           1            1.5         1000            True   
1        value2           2            2.5         2000           False   

  column-localdate     column-localdatetime  \
0       2018-03-27  2018-03-27T08:30:45.111   
1       2018-03-28  2018-03-28T08:30:45.111   

                          column-zoneddatetime  
0  2018-03-27T08:30:45.111+01:00[Europe/Paris]  
1  2018-03-28T08:30:45.111+01:00[Europe/Paris]

It would be much better if an exception was thrown or the execute method returned some status object so we do not have to rely on checking the column names of the output table to know if an error occurred.

applio commented 3 years ago

At the time of this writing, the KNIME Server responds with a code of 200 for a request to execute a workflow where an exception / error occurs during the execution of that workflow. To be clear, the KNIME Server also responds with a code of 200 for a request to execute a workflow where no problems occur and the workflow executes happily to completion.

From a user's perspective, one might expect a 5xx response code to convey that something went wrong during the requested workflow's execution and then share any and all relevant information about the error encountered. A counter-argument to this idea might be that the Server performed its full duty by executing the workflow and capturing the result even if that result is failure, therefore a response code of 200 is appropriate as a 200 indicates success. The debate of which perspective is more appropriate will not be decided here -- instead, there is a more curious behavior to consider.

Curiously, when a workflow's execution results in failure, the KNIME Server elects to send this default data table (what you shared in your original post) as part of its response. The content of this "failover" data table is controllable within KNIME as it is possible to prime the Container Table input and output nodes with your own default table of data (instead of the generic default you shared). In this way, the user may compose an appropriate table of data to share in the event of a failed workflow execution. This can be a very helpful piece of functionality.

What is arguably missing from knimepy is a mechanism to pass along to the user the full response from the KNIME Server, as certain optional fields will appear in a response payload from a failed workflow execution. knimepy already raises an exception if something other than a 200 response code is received. It likewise would make sense to expose an option to trigger an exception in the event of a workflow execution failure, for those situations where knimepy can confidently interpret the optional fields in the response from the KNIME Server, but that option can be turned off when the user would prefer to simply make use of the default/failover data table.

applio commented 3 years ago

It does not help the situation described but it is worth pointing out that the response status code received from the KNIME Server is exposed on every Workflow instance in the attribute _last_status_code (i.e. wf = Workflow("https://..."); wf.execute(); print(wf._last_status_code)).

AlexanderFillbrunn commented 3 years ago

@applio Thank you for your detailed response! Maybe it would be the right approach to check the state JSON property of the response. The full response looks something like this:

{
    "id": "d0eaf48b-a943-4c5f-9ccf-87923024cc61",
    "discardAfterSuccessfulExec": false,
    "discardAfterFailedExec": false,
    "configuration": {},
    "executorName": "knime-server",
    "executorIPs": [
        "10.0.2.15"
    ],
    "executorID": "2c227028-bc19-41de-bcd6-356f32b0c8d6",
    "createdVia": "generic client",
    "state": "EXECUTION_FINISHED",
    "owner": "alexander",
    "notifications": {},
    "nodeMessages": [
        {
            "node": "Row Filter 0:44:0:29",
            "messageType": "WARNING",
            "message": "Node created an empty data table."
        }
    ],
    "isOutdated": false,
    "createdAt": "2021-05-10T08:33:09.735+02:00[Europe/Berlin]",
    "startedExecutionAt": "2021-05-10T06:33:10.014Z[UTC]",
    "workflow": "/Prod/Formel",
    "isSwapped": false,
    "hasReport": false,
    "finishedExecutionAt": "2021-05-10T08:33:10.586+02:00[Europe/Berlin]",
    "name": "Formel 2021-05-10 08.33.09",
    "properties": {
        "com.knime.enterprise.server.executor.requirements": "",
        "com.knime.enterprise.server.jobpool.size": "0"
    }
}

In newer KNIME Server versions the state for correct executions is EXECUTION_FINISHED, in older ones I think it was EXECUTED, if I remember correctly. Maybe knimepy could throw an exception when the state is not any of the two?