langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.95k stars 6.65k forks source link

Detailed logs break down with graph_engine and iteration node #8693

Open AndyMik90 opened 1 week ago

AndyMik90 commented 1 week ago

Self Checks

Dify version

0.8.2

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

Having 10 iteration nodes running in parallel with workflow as tools inside them, if 1 workflow fails inside one of the iteration nodes, the debug logs shows the error on all iteration nodes and all workflow tools. So it's not possible to debug. CleanShot 2024-09-23 at 17 45 36

✔️ Expected Behavior

Normal detailed logging, showing what workflow tool / iteration node that failed

❌ Actual Behavior

All nodes, even those who didn't fail, shows error.

Also the nodes that are inside the iteration node, shows as being outside of the parallel nested view.

dosubot[bot] commented 1 week ago

I found a similar closed issue that might be relevant to your problem: The workflow has ended, but there is no output on the front end of the page, and the workflow keeps spinning [1].

To ensure that detailed logs correctly identify which workflow tool or iteration node failed when running 10 iteration nodes in parallel with the graph_engine, you can utilize the IterationRunFailedEvent and NodeRunSucceededEvent events. These events include metadata that can help identify the specific iteration node that failed.

Here is a relevant code snippet that shows how these events are logged:

for event in rst:
    if isinstance(event, (BaseNodeEvent | BaseParallelBranchEvent)) and not event.in_iteration_id:
        event.in_iteration_id = self.node_id
    if isinstance(event, NodeRunSucceededEvent):
        if event.route_node_state.node_run_result:
            metadata = event.route_node_state.node_run_result.metadata
            if not metadata:
                metadata = {}

            if NodeRunMetadataKey.ITERATION_ID not in metadata:
                metadata[NodeRunMetadataKey.ITERATION_ID] = self.node_id
                metadata[NodeRunMetadataKey.ITERATION_INDEX] = variable_pool.get_any(
                    [self.node_id, "index"]
                )
                event.route_node_state.node_run_result.metadata = metadata

        yield event
    elif isinstance(event, BaseGraphEvent):
        if isinstance(event, GraphRunFailedEvent):
            # iteration run failed
            yield IterationRunFailedEvent(
                iteration_id=self.id,
                iteration_node_id=self.node_id,
                iteration_node_type=self.node_type,
                iteration_node_data=self.node_data,
                start_at=start_at,
                inputs=inputs,
                outputs={"output": jsonable_encoder(outputs)},
                steps=len(iterator_list_value),
                metadata={"total_tokens": graph_engine.graph_runtime_state.total_tokens},
                error=event.error,
            )
            yield RunCompletedEvent(
                run_result=NodeRunResult(
                    status=WorkflowNodeExecutionStatus.FAILED,
                    error=event.error,
                )
            )
            return
    else:
        event = cast(InNodeEvent, event)
        yield event

In this code, the IterationRunFailedEvent and NodeRunSucceededEvent events are yielded with metadata that includes the iteration_node_id and iteration_index. This metadata will help you identify which specific iteration node failed or succeeded, ensuring detailed and accurate logging [2][3].

Additionally, the WorkflowBasedAppRunner class in the workflow_app_runner.py file handles various events and publishes them to the queue manager. This ensures that all events, including node failures and successes, are properly logged and can be traced back to the specific iteration node [4].

To continue talking to Dosu, mention @dosu.