langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
50.82k stars 7.31k forks source link

Compatibility Issues with OpenAI API in Agent Mode #5542

Closed XiaoLey closed 4 months ago

XiaoLey commented 4 months ago

Self Checks

Dify version

0.6.11

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

Description:

I have spent considerable time tracking these issues. There are unresolved problems with the compatibility of the openai_api_compatible model provider with the Agent. These issues include:

  1. Responses containing formatting artifacts.
  2. Memory function incorrectly recalling assistant's replies.

I discovered these issues while using the gpt-4o model from the OpenRouter provider, which is implemented based on the openai_api_compatible model provider.

Snipaste_2024-06-18_17-29-14

  1. Use the gpt-4o model from the OpenRouter provider in Agent mode.
  2. Observe responses for formatting artifacts.
  3. Test memory function for recalling previous interactions.

✔️ Expected Behavior

The OpenRouter model provider should work seamlessly with the Agent, providing correctly formatted responses and accurately recalling past interactions.

❌ Actual Behavior

The model produces responses with formatting artifacts and fails to recall previous assistant replies accurately.

dosubot[bot] commented 4 months ago

To address the compatibility issues with the gpt-4o model from the OpenRouter provider in Agent mode, you can take the following steps:

  1. Adjust the response_format Parameter:

    • This parameter allows you to specify the format that the model must output, which can help mitigate formatting artifacts.
    • You can set the response_format to either text or json_object depending on your needs.

    Example configuration:

    parameter_rules:
     - name: response_format
       default: text

    Or for JSON format:

    parameter_rules:
     - name: response_format
       default: json_object
  2. Adjust Memory Settings:

    • Increase the context_size and max_tokens_to_sample parameters to help the model retain more context, potentially improving recall accuracy.
    • Enable and adjust the window size in the memory settings to manage the amount of context the model retains.

    Example configuration for context_size and max_tokens_to_sample:

    - variable: context_size
     default: "4096"
    - variable: max_tokens_to_sample
     default: "4096"

    Example code to enable and adjust the window size:

    const handleWindowEnabledChange = useCallback((enabled: boolean) => {
     const newPayload = produce(config.data || MEMORY_DEFAULT, (draft) => {
       if (!draft.window)
         draft.window = { enabled: false, size: 50 }
    
       draft.window.enabled = enabled
     })
    
     onChange(newPayload)
    }, [config, onChange])
    
    const handleWindowSizeChange = useCallback((size: number | string) => {
     const newPayload = produce(payload || MEMORY_DEFAULT, (draft) => {
       if (!draft.window)
         draft.window = { enabled: true, size: 50 }
       let limitedSize: null | string | number = size
       if (limitedSize === '') {
         limitedSize = null
       }
       else {
         limitedSize = parseInt(limitedSize as string, 10)
         if (isNaN(limitedSize))
           limitedSize = 50
    
         if (limitedSize < 1)
           limitedSize = 1
    
         if (limitedSize > 100)
           limitedSize = 100
       }
    
       draft.window.size = limitedSize as number
     })
     onChange(newPayload)
    }, [payload, onChange])
  3. Serialization and Deserialization of prompt_messages:

    • Ensure that the jsonable_encoder function correctly handles the serialization and deserialization of prompt_messages to avoid formatting artifacts and memory recall issues.

    Relevant code snippet:

    try:
       text, usage, tool_call = self._invoke_llm(
           node_data_model=node_data.model,
           model_instance=model_instance,
           prompt_messages=prompt_messages,
           tools=prompt_message_tools,
           stop=model_config.stop,
       )
       process_data['usage'] = jsonable_encoder(usage)
       process_data['tool_call'] = jsonable_encoder(tool_call)
       process_data['llm_text'] = text
    except Exception as e:
       return NodeRunResult(
           status=WorkflowNodeExecutionStatus.FAILED,
           inputs=inputs,
           process_data={},
           outputs={
               '__is_success': 0,
               '__reason': str(e)
           },
           error=str(e),
           metadata={}
       )

By implementing these adjustments, you should be able to resolve the formatting artifacts and improve the memory recall of the assistant's replies [1][2][3][4][5][6][7].

To continue talking to Dosu, mention @dosu.

XiaoLey commented 4 months ago

This issue appears to have been resolved by PR #5494, so I will be closing it now.