Detailed logs break down with graph_engine and iteration node

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.2

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

Having 10 iteration nodes running in parallel with workflow as tools inside them, if 1 workflow fails inside one of the iteration nodes, the debug logs shows the error on all iteration nodes and all workflow tools. So it's not possible to debug. CleanShot 2024-09-23 at 17 45 36

✔️ Expected Behavior

Normal detailed logging, showing what workflow tool / iteration node that failed

❌ Actual Behavior

All nodes, even those who didn't fail, shows error.

Also the nodes that are inside the iteration node, shows as being outside of the parallel nested view.

I found a similar closed issue that might be relevant to your problem: The workflow has ended, but there is no output on the front end of the page, and the workflow keeps spinning ^[1].

To ensure that detailed logs correctly identify which workflow tool or iteration node failed when running 10 iteration nodes in parallel with the graph_engine, you can utilize the IterationRunFailedEvent and NodeRunSucceededEvent events. These events include metadata that can help identify the specific iteration node that failed.

Here is a relevant code snippet that shows how these events are logged:

for event in rst:
    if isinstance(event, (BaseNodeEvent | BaseParallelBranchEvent)) and not event.in_iteration_id:
        event.in_iteration_id = self.node_id
    if isinstance(event, NodeRunSucceededEvent):
        if event.route_node_state.node_run_result:
            metadata = event.route_node_state.node_run_result.metadata
            if not metadata:
                metadata = {}

            if NodeRunMetadataKey.ITERATION_ID not in metadata:
                metadata[NodeRunMetadataKey.ITERATION_ID] = self.node_id
                metadata[NodeRunMetadataKey.ITERATION_INDEX] = variable_pool.get_any(
                    [self.node_id, "index"]
                )
                event.route_node_state.node_run_result.metadata = metadata

        yield event
    elif isinstance(event, BaseGraphEvent):
        if isinstance(event, GraphRunFailedEvent):
            # iteration run failed
            yield IterationRunFailedEvent(
                iteration_id=self.id,
                iteration_node_id=self.node_id,
                iteration_node_type=self.node_type,
                iteration_node_data=self.node_data,
                start_at=start_at,
                inputs=inputs,
                outputs={"output": jsonable_encoder(outputs)},
                steps=len(iterator_list_value),
                metadata={"total_tokens": graph_engine.graph_runtime_state.total_tokens},
                error=event.error,
            )
            yield RunCompletedEvent(
                run_result=NodeRunResult(
                    status=WorkflowNodeExecutionStatus.FAILED,
                    error=event.error,
                )
            )
            return
    else:
        event = cast(InNodeEvent, event)
        yield event

In this code, the IterationRunFailedEvent and NodeRunSucceededEvent events are yielded with metadata that includes the iteration_node_id and iteration_index. This metadata will help you identify which specific iteration node failed or succeeded, ensuring detailed and accurate logging ^[2]^[3].

Additionally, the WorkflowBasedAppRunner class in the workflow_app_runner.py file handles various events and publishes them to the queue manager. This ensures that all events, including node failures and successes, are properly logged and can be traced back to the specific iteration node ^[4].

To continue talking to Dosu, mention @dosu.

langgenius / dify