Compatibility Issues with OpenAI API in Agent Mode

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.11

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

Description:

I have spent considerable time tracking these issues. There are unresolved problems with the compatibility of the openai_api_compatible model provider with the Agent. These issues include:

Responses containing formatting artifacts.
Memory function incorrectly recalling assistant's replies.

I discovered these issues while using the gpt-4o model from the OpenRouter provider, which is implemented based on the openai_api_compatible model provider.

Snipaste_2024-06-18_17-29-14

Use the gpt-4o model from the OpenRouter provider in Agent mode.
Observe responses for formatting artifacts.
Test memory function for recalling previous interactions.

✔️ Expected Behavior

The OpenRouter model provider should work seamlessly with the Agent, providing correctly formatted responses and accurately recalling past interactions.

❌ Actual Behavior

The model produces responses with formatting artifacts and fails to recall previous assistant replies accurately.

To address the compatibility issues with the gpt-4o model from the OpenRouter provider in Agent mode, you can take the following steps:

Adjust the response_format Parameter:
- This parameter allows you to specify the format that the model must output, which can help mitigate formatting artifacts.
- You can set the response_format to either text or json_object depending on your needs.
Example configuration:
```
parameter_rules:
 - name: response_format
   default: text
```
Or for JSON format:
```
parameter_rules:
 - name: response_format
   default: json_object
```

Adjust Memory Settings:

Increase the context_size and max_tokens_to_sample parameters to help the model retain more context, potentially improving recall accuracy.
Enable and adjust the window size in the memory settings to manage the amount of context the model retains.

Example configuration for context_size and max_tokens_to_sample:

- variable: context_size
 default: "4096"
- variable: max_tokens_to_sample
 default: "4096"

Example code to enable and adjust the window size:

const handleWindowEnabledChange = useCallback((enabled: boolean) => {
 const newPayload = produce(config.data || MEMORY_DEFAULT, (draft) => {
   if (!draft.window)
     draft.window = { enabled: false, size: 50 }

   draft.window.enabled = enabled
 })

 onChange(newPayload)
}, [config, onChange])

const handleWindowSizeChange = useCallback((size: number | string) => {
 const newPayload = produce(payload || MEMORY_DEFAULT, (draft) => {
   if (!draft.window)
     draft.window = { enabled: true, size: 50 }
   let limitedSize: null | string | number = size
   if (limitedSize === '') {
     limitedSize = null
   }
   else {
     limitedSize = parseInt(limitedSize as string, 10)
     if (isNaN(limitedSize))
       limitedSize = 50

     if (limitedSize < 1)
       limitedSize = 1

     if (limitedSize > 100)
       limitedSize = 100
   }

   draft.window.size = limitedSize as number
 })
 onChange(newPayload)
}, [payload, onChange])

Serialization and Deserialization of prompt_messages:

Ensure that the jsonable_encoder function correctly handles the serialization and deserialization of prompt_messages to avoid formatting artifacts and memory recall issues.

Relevant code snippet:

try:
   text, usage, tool_call = self._invoke_llm(
       node_data_model=node_data.model,
       model_instance=model_instance,
       prompt_messages=prompt_messages,
       tools=prompt_message_tools,
       stop=model_config.stop,
   )
   process_data['usage'] = jsonable_encoder(usage)
   process_data['tool_call'] = jsonable_encoder(tool_call)
   process_data['llm_text'] = text
except Exception as e:
   return NodeRunResult(
       status=WorkflowNodeExecutionStatus.FAILED,
       inputs=inputs,
       process_data={},
       outputs={
           '__is_success': 0,
           '__reason': str(e)
       },
       error=str(e),
       metadata={}
   )

By implementing these adjustments, you should be able to resolve the formatting artifacts and improve the memory recall of the assistant's replies ^[1]^[2]^[3]^[4]^[5]^[6]^[7].

To continue talking to Dosu, mention @dosu.

langgenius / dify