langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.76k stars 6.61k forks source link

llm node has been in Running state for a long time and cannot be terminated #8756

Open deific opened 3 days ago

deific commented 3 days ago

Self Checks

Dify version

0.8.3

Cloud or Self Hosted

Cloud, Self Hosted (Docker)

Steps to reproduce

llm node has been in Running state for a long time and cannot be terminated。 Sometimes the large model nodes do not respond and remain in a running state for a long time, and no exceptions or errors are found in the backend logs. image Sometimes llm node has already returned a response, but the node status is still running. image

✔️ Expected Behavior

The status of LLM is either ending normally or indicating an error.

❌ Actual Behavior

llm node has been in Running state for a long time

dosubot[bot] commented 3 days ago

The LLM node in Dify version 0.8.3 may remain in the "Running" state for a long time and not terminate due to several reasons:

  1. Context Fetching: The _fetch_context method involves a generator that yields RunRetrieverResourceEvent events. If the context fetching process is slow or if there is an issue with the context data, it could cause the node to remain in the running state.

  2. Model Invocation: The _invoke_llm method involves invoking the large language model and handling the result through a generator. If the model invocation takes a long time or if there is an issue with the streaming of results, it could cause the node to remain in the running state.

  3. Event Handling: The _handle_invoke_result method processes the results from the model invocation. If there is a delay or issue in processing the streamed chunks or final results, it could cause the node to remain in the running state.

Here is the relevant code that shows these processes:

def _fetch_context(self, node_data: LLMNodeData, variable_pool: VariablePool) -> Generator[RunEvent, None, None]:
    if not node_data.context.enabled:
        return

    if not node_data.context.variable_selector:
        return

    context_value = variable_pool.get_any(node_data.context.variable_selector)
    if context_value:
        if isinstance(context_value, str):
            yield RunRetrieverResourceEvent(retriever_resources=[], context=context_value)
        elif isinstance(context_value, list):
            context_str = ""
            original_retriever_resource = []
            for item in context_value:
                if isinstance(item, str):
                    context_str += item + "\n"
                else:
                    if "content" not in item:
                        raise ValueError(f"Invalid context structure: {item}")

                    context_str += item["content"] + "\n"

                    retriever_resource = self._convert_to_original_retriever_resource(item)
                    if retriever_resource:
                        original_retriever_resource.append(retriever_resource)

            yield RunRetrieverResourceEvent(
                retriever_resources=original_retriever_resource, context=context_str.strip()
            )

def _invoke_llm(
    self,
    node_data_model: ModelConfig,
    model_instance: ModelInstance,
    prompt_messages: list[PromptMessage],
    stop: Optional[list[str]] = None,
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
    db.session.close()

    invoke_result = model_instance.invoke_llm(
        prompt_messages=prompt_messages,
        model_parameters=node_data_model.completion_params,
        stop=stop,
        stream=True,
        user=self.user_id,
    )

    generator = self._handle_invoke_result(invoke_result=invoke_result)

    usage = LLMUsage.empty_usage()
    for event in generator:
        yield event
        if isinstance(event, ModelInvokeCompleted):
            usage = event.usage

    self.deduct_llm_quota(tenant_id=self.tenant_id, model_instance=model_instance, usage=usage)

def _handle_invoke_result(
    self, invoke_result: LLMResult | Generator
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
    if isinstance(invoke_result, LLMResult):
        return

    model = None
    prompt_messages: list[PromptMessage] = []
    full_text = ""
    usage = None
    finish_reason = None
    for result in invoke_result:
        text = result.delta.message.content
        full_text += text

        yield RunStreamChunkEvent(chunk_content=text, from_variable_selector=[self.node_id, "text"])

        if not model:
            model = result.model

        if not prompt_messages:
            prompt_messages = result.prompt_messages

        if not usage and result.delta.usage:
            usage = result.delta.usage

        if not finish_reason and result.delta.finish_reason:
            finish_reason = result.delta.finish_reason

    if not usage:
        usage = LLMUsage.empty_usage()

    yield ModelInvokeCompleted(text=full_text, usage=usage, finish_reason=finish_reason)

These methods involve generators and event handling that could potentially cause delays or issues leading to the node remaining in the "Running" state [1].

To continue talking to Dosu, mention @dosu.