langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
45.31k stars 6.37k forks source link

Encountering an error with Azure's GPT-4o. #4593

Closed Mgrsc closed 3 months ago

Mgrsc commented 3 months ago

Self Checks

Dify version

0.6.8

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Using GPT-4o through the Azure channel.

✔️ Expected Behavior

Normal request

❌ Actual Behavior

When using Azure's gpt-4o model, an error occurred: Unknown parameter: 'stream_options'. When you address this issue, please be aware that the actual request format is in the style of OpenAI for Azure. You may encounter this when using one-api or new-api for Azure redirection. 2024-05-22 223734

dosubot[bot] commented 3 months ago

To resolve the "Bad Request Error" with status code 400 caused by the "Unknown parameter: 'stream_options'" when using Azure's GPT-4o model through the Azure channel in a self-hosted environment with Dify version 0.6.8, ensure that the stream_options parameter is not being sent in your request. The provided YAML configuration for the GPT-4o model does not include stream_options as a valid parameter.

Here is a sample configuration without the stream_options parameter:

model: gpt-4o
label:
  zh_Hans: gpt-4o
  en_US: gpt-4o
model_type: llm
features:
  - multi-tool-call
  - agent-thought
  - stream-tool-call
  - vision
model_properties:
  mode: chat
  context_size: 128000
parameter_rules:
  - name: temperature
    use_template: temperature
  - name: top_p
    use_template: top_p
  - name: presence_penalty
    use_template: presence_penalty
  - name: frequency_penalty
    use_template: frequency_penalty
  - name: max_tokens
    use_template: max_tokens
    default: 512
    min: 1
    max: 4096
  - name: seed
    label:
      zh_Hans: 种子
      en_US: Seed
    type: int
    help:
      zh_Hans: 如果指定,模型将尽最大努力进行确定性采样,使得重复的具有相同种子和参数的请求应该返回相同的结果。不能保证确定性,您应该参考 system_fingerprint 响应参数来监视变化。
      en_US: If specified, model will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
    required: false
    precision: 2
    min: 0
    max: 1
  - name: response_format
    label:
      zh_Hans: 回复格式
      en_US: response_format
    type: string
    help:
      zh_Hans: 指定模型必须输出的格式
      en_US: specifying the format that the model must output
    required: false
    options:
      - text
      - json_object
pricing:
  input: '5.00'
  output: '15.00'
  unit: '0.000001'
  currency: USD

Make sure your request payload adheres to the parameters defined in the parameter_rules section and does not include any undefined parameters like stream_options.

Additionally, you can refer to the integration tests provided in the Dify repository to understand how to correctly invoke the Azure OpenAI models. For example, the test_invoke_stream_completion_model function demonstrates how to invoke a model with streaming enabled without using the stream_options parameter:

@pytest.mark.parametrize('setup_openai_mock', [['completion']], indirect=True)
def test_invoke_stream_completion_model(setup_openai_mock):
    model = AzureOpenAILargeLanguageModel()

    result = model.invoke(
        model='gpt-35-turbo-instruct',
        credentials={
            'openai_api_base': os.environ.get('AZURE_OPENAI_API_BASE'),
            'openai_api_key': os.environ.get('AZURE_OPENAI_API_KEY'),
            'base_model_name': 'gpt-35-turbo-instruct'
        },
        prompt_messages=[
            UserPromptMessage(
                content='Hello World!'
            )
        ],
        model_parameters={
            'temperature': 0.0,
            'max_tokens': 100
        },
        stream=True,
        user="abc-123"
    )

    assert isinstance(result, Generator)

    for chunk in result:
        assert isinstance(chunk, LLMResultChunk)
        assert isinstance(chunk.delta, LLMResultChunkDelta)
        assert isinstance(chunk.delta.message, AssistantPromptMessage)
        assert len(chunk.delta.message.content) > 0 if chunk.delta.finish_reason is None else True

This example shows how to set up the request correctly without including the stream_options parameter.

#### Details **Sources** - [Added Azure OpenAI Service Model, error when selecting base model "gpt-4-turbo-2024-04-09"](https://github.com/langgenius/dify/issues/4057) - [api/core/model_runtime/model_providers/openai/llm/gpt-4o.yaml](https://github.com/langgenius/dify/blob/main/api/core/model_runtime/model_providers/openai/llm/gpt-4o.yaml) - [api/core/model_runtime/model_providers/azure_openai/_constant.py](https://github.com/langgenius/dify/blob/main/api/core/model_runtime/model_providers/azure_openai/_constant.py) - [api/tests/integration_tests/model_runtime/azure_openai/test_llm.py](https://github.com/langgenius/dify/blob/main/api/tests/integration_tests/model_runtime/azure_openai/test_llm.py) - [api/core/model_runtime/model_providers/azure_openai/llm/llm.py](https://github.com/langgenius/dify/blob/main/api/core/model_runtime/model_providers/azure_openai/llm/llm.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dontbanmegithublol commented 3 months ago

I encountered this problem today as well. It appears that Dify is using a new API, according to this discussion. However, the Azure API typically lags a bit and does not yet support the stream_options field. Here is the relevant code.

Disabling the stream_options field should work, but I haven't tried it myself.

dq7532183 commented 3 months ago

I must develop it in source code and delete this paragraph,I think this parameter can be used as a model configuration for users to choose whether to turn it on or off, rather than being written dead in the code.

Line 571 in api\core\model_runtime\model_providers\openai\llm.py

    if stream:
        extra_model_kwargs['stream_options'] = {
            'include_usage': True
        }
zhudongwork commented 1 month ago

This is a bug that still exists; it is recommended to optimize!