Enhance Error Handling in LLM Nodes: Continue Workflow Execution with Error Output.

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Currently, when an LLM node encounters an error, the entire workflow is terminated. This behavior can be limiting and may not always be the desired outcome. I propose implementing a more flexible error handling mechanism for LLM nodes in the workflow.

Proposed improvements:

Continue workflow execution: Instead of terminating the entire workflow when an LLM node encounters an error, allow the workflow to continue executing subsequent nodes. Error output: Capture and output the error message from the LLM node, making it available for downstream nodes or final output. User-defined error handling: Implement an option for users to choose how errors should be handled on a per-node basis. Options could include: Terminate workflow (current behavior) Continue workflow with error output Retry LLM call (with configurable retry attempts) Error branching: Allow users to define alternative workflow paths based on error conditions, similar to try-catch statements in programming. Error logging: Implement comprehensive error logging for LLM nodes to facilitate debugging and improvement of workflows. These enhancements would provide greater flexibility and robustness in workflow design, especially when dealing with potentially unreliable or inconsistent LLM responses.

Example scenario: I recently encountered an issue where Claude 3.5 Sonnet reported an error due to perceived illegal content, while GPT-4 processed the same input without issues. With the current implementation, the Dify workflow terminated upon receiving this error. The proposed changes would allow the workflow to continue, providing valuable information about the discrepancy between different LLM models.

By implementing these improvements, we can create more resilient and informative workflows, enhancing the overall user experience and the utility of the platform.

目前，当 LLM 节点遇到错误时，整个工作流会被终止。这种行为可能会造成限制，并不总是符合预期。我建议为工作流中的 LLM 节点实现一个更灵活的错误处理机制。

建议的改进：

继续执行工作流：当 LLM 节点遇到错误时，不要终止整个工作流，而是允许工作流继续执行后续节点。错误输出：捕获并输出 LLM 节点的错误信息，使其可用于下游节点或最终输出。用户自定义错误处理：实现一个选项，允许用户针对每个节点选择如何处理错误。选项可能包括：终止工作流（当前行为）继续工作流并输出错误信息重试 LLM 调用（可配置重试次数）错误分支：允许用户根据错误条件定义替代工作流路径，类似于编程中的 try-catch 语句。错误日志：为 LLM 节点实现全面的错误日志记录，以便于调试和改进工作流。这些增强功能将为工作流设计提供更大的灵活性和稳健性，特别是在处理可能不可靠或不一致的 LLM 响应时。

示例场景：我最近遇到一个问题，Claude 3.5 Sonnet 报告了一个错误，认为内容不合法，而 GPT-4 处理相同的输入时没有问题。在当前的实现中，Dify 工作流在收到这个错误时就终止了。proposed 的改变将允许工作流继续执行，提供有关不同 LLM 模型之间差异的宝贵信息。

通过实施这些改进，我们可以创建更具弹性和信息量的工作流，从而提高整体用户体验和平台的实用性。

2. Additional context or comments

No response

3. Can you help us with this feature?

[x] I am interested in contributing to this feature.

langgenius / dify