llm node has been in Running state for a long time and cannot be terminated

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.3

Cloud or Self Hosted

Cloud, Self Hosted (Docker)

Steps to reproduce

llm node has been in Running state for a long time and cannot be terminated。 Sometimes the large model nodes do not respond and remain in a running state for a long time, and no exceptions or errors are found in the backend logs. Sometimes llm node has already returned a response, but the node status is still running.

✔️ Expected Behavior

The status of LLM is either ending normally or indicating an error.

❌ Actual Behavior

llm node has been in Running state for a long time

The LLM node in Dify version 0.8.3 may remain in the "Running" state for a long time and not terminate due to several reasons:

Context Fetching: The _fetch_context method involves a generator that yields RunRetrieverResourceEvent events. If the context fetching process is slow or if there is an issue with the context data, it could cause the node to remain in the running state.
Model Invocation: The _invoke_llm method involves invoking the large language model and handling the result through a generator. If the model invocation takes a long time or if there is an issue with the streaming of results, it could cause the node to remain in the running state.
Event Handling: The _handle_invoke_result method processes the results from the model invocation. If there is a delay or issue in processing the streamed chunks or final results, it could cause the node to remain in the running state.

Here is the relevant code that shows these processes:

def _fetch_context(self, node_data: LLMNodeData, variable_pool: VariablePool) -> Generator[RunEvent, None, None]:
    if not node_data.context.enabled:
        return

    if not node_data.context.variable_selector:
        return

    context_value = variable_pool.get_any(node_data.context.variable_selector)
    if context_value:
        if isinstance(context_value, str):
            yield RunRetrieverResourceEvent(retriever_resources=[], context=context_value)
        elif isinstance(context_value, list):
            context_str = ""
            original_retriever_resource = []
            for item in context_value:
                if isinstance(item, str):
                    context_str += item + "\n"
                else:
                    if "content" not in item:
                        raise ValueError(f"Invalid context structure: {item}")

                    context_str += item["content"] + "\n"

                    retriever_resource = self._convert_to_original_retriever_resource(item)
                    if retriever_resource:
                        original_retriever_resource.append(retriever_resource)

            yield RunRetrieverResourceEvent(
                retriever_resources=original_retriever_resource, context=context_str.strip()
            )

def _invoke_llm(
    self,
    node_data_model: ModelConfig,
    model_instance: ModelInstance,
    prompt_messages: list[PromptMessage],
    stop: Optional[list[str]] = None,
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
    db.session.close()

    invoke_result = model_instance.invoke_llm(
        prompt_messages=prompt_messages,
        model_parameters=node_data_model.completion_params,
        stop=stop,
        stream=True,
        user=self.user_id,
    )

    generator = self._handle_invoke_result(invoke_result=invoke_result)

    usage = LLMUsage.empty_usage()
    for event in generator:
        yield event
        if isinstance(event, ModelInvokeCompleted):
            usage = event.usage

    self.deduct_llm_quota(tenant_id=self.tenant_id, model_instance=model_instance, usage=usage)

def _handle_invoke_result(
    self, invoke_result: LLMResult | Generator
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
    if isinstance(invoke_result, LLMResult):
        return

    model = None
    prompt_messages: list[PromptMessage] = []
    full_text = ""
    usage = None
    finish_reason = None
    for result in invoke_result:
        text = result.delta.message.content
        full_text += text

        yield RunStreamChunkEvent(chunk_content=text, from_variable_selector=[self.node_id, "text"])

        if not model:
            model = result.model

        if not prompt_messages:
            prompt_messages = result.prompt_messages

        if not usage and result.delta.usage:
            usage = result.delta.usage

        if not finish_reason and result.delta.finish_reason:
            finish_reason = result.delta.finish_reason

    if not usage:
        usage = LLMUsage.empty_usage()

    yield ModelInvokeCompleted(text=full_text, usage=usage, finish_reason=finish_reason)

These methods involve generators and event handling that could potentially cause delays or issues leading to the node remaining in the "Running" state ^[1].

To continue talking to Dosu, mention @dosu.

langgenius / dify