frequency_penalty at 0 causes no response content with Phi-3-mini-4k-cpu-int4-rtn-block-32-acc-level-4-onnx

therealjohn commented 3 months ago

Frequency response at 0 causes an issue with no content in the response. > 0 by < 1 cause other weird responses. 1 seems to be the only reliable value and its unclear if its the model or something else.

POST http://127.0.0.1:5272/v1/chat/completions
content-type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": "Whats the golden ratio"
        }
    ],
    "frequency_penalty": 0,
    "model": "Phi-3-mini-4k-cpu-int4-rtn-block-32-acc-level-4-onnx"
}

You will get a response like:

{
  "model": null,
  "choices": [
    {
      "delta": {
        "role": "assistant",
        "content": "",
        "name": null,
        "tool_call_id": null,
        "function_call": null,
        "tool_calls": null
      },
      "message": {
        "role": "assistant",
        "content": "",
        "name": null,
        "tool_call_id": null,
        "function_call": null,
        "tool_calls": null
      },
      "index": 0,
      "finish_reason": "stop",
      "finish_details": null,
      "logprobs": null
    }
  ],
  "usage": null,
  "created": 1724095112,
  "id": "chat.id.2641",
  "system_fingerprint": null,
  "object": "chat.completion",
  "Successful": true,
  "error": null,
  "HttpStatusCode": 0,
  "HeaderValues": null
}

swatDong commented 2 months ago

@a1exwang - is this caused by invalid parameter? May consider adding value check for all input parameters.

a1exwang commented 2 months ago

AITK uses ONNX runtime GenAI for inference and frequency_penalty is converted to repetition_penalty behind the scene.
According to ONNX documentation, repetition_penalty cannot be 0.
As the tooltip mentions, this parameter controls likelihood of repetition. So if you set a lower value, it will likely repeat itself. That's why you will see weird values when set to 0~1.
The value 1 is not the only reliable value. You can also set it to greater than 1, which will decrease the likelihood of repetition more.

I think we can add range validation for input parameters as @swatDong said

microsoft / vscode-ai-toolkit

frequency_penalty at 0 causes no response content with Phi-3-mini-4k-cpu-int4-rtn-block-32-acc-level-4-onnx #90