Cannot add openai compatible stt model.

tetyoka commented 2 days ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Just add OpenAI-API-compatible Speech2text model.

I'm using faster-whisper-server for local OA-compatible API server. This server is running on another docker container in same computer.

It looks like every API request was succeeded. However, Speech2text model does not appear in model list.

It may be caused by my local whisper server. But since Dify app tells me "Operation successful", it should be Dify's bug.

✔️ Expected Behavior

STT model appear in model list. If operation failed, I get notification.

❌ Actual Behavior

All process looks like succeeded. STT model does not appear.

dosubot[bot] commented 2 days ago

Hey @tetyoka! I'm here to help you with any bugs, questions, or contributions you have for Dify. Let's get this issue sorted out together!

There are a few similar issues and discussions related to adding OpenAI-compatible speech-to-text (STT) models in Dify:

A common cause for the issue where the OpenAI-compatible Speech2text model does not appear in the model list, despite successful API requests and notifications, is the failure of credentials validation. The credentials validation might be failing, which prevents the model from being listed.

Ensure that the openai_api_key provided in the credentials is correct and valid. If the credentials are invalid, the model will not be listed. Additionally, make sure that the local faster-whisper-server is properly configured and accessible.

Here is a relevant snippet from the test script that includes a validation step:

def test_validate_credentials():
    model = OpenAISpeech2TextModel()

    with pytest.raises(CredentialsValidateFailedError):
        model.validate_credentials(model="whisper-1", credentials={"openai_api_key": "invalid_key"})

    model.validate_credentials(model="whisper-1", credentials={"openai_api_key": os.environ.get("OPENAI_API_KEY")})

The OAICompatSpeech2TextModel class in the Dify codebase shows that the _invoke method is responsible for making the API call to the speech-to-text service. This method constructs the request using the provided api_key and endpoint_url from the credentials. If the endpoint_url is not correctly configured or if the API key is invalid, the request will fail, and the model will not be listed.

def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str:
    headers = {}

    api_key = credentials.get("api_key")
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"

    endpoint_url = credentials.get("endpoint_url")
    if not endpoint_url.endswith("/"):
        endpoint_url += "/"
    endpoint_url = urljoin(endpoint_url, "audio/transcriptions")

    payload = {"model": model}
    files = [("file", file)]
    response = requests.post(endpoint_url, headers=headers, data=payload, files=files)

    if response.status_code != 200:
        raise InvokeBadRequestError(response.text)
    response_data = response.json()
    return response_data["text"]

Make sure that the endpoint_url and api_key in the credentials are correctly set and that the local faster-whisper-server is running and accessible at the specified endpoint_url ^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.