langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.75k stars 6.61k forks source link

Cannot add openai compatible stt model. #8811

Open tetyoka opened 2 days ago

tetyoka commented 2 days ago

Self Checks

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Just add OpenAI-API-compatible Speech2text model.

I'm using faster-whisper-server for local OA-compatible API server. This server is running on another docker container in same computer.

It looks like every API request was succeeded. However, Speech2text model does not appear in model list. image

It may be caused by my local whisper server. But since Dify app tells me "Operation successful", it should be Dify's bug. image

✔️ Expected Behavior

STT model appear in model list. If operation failed, I get notification.

❌ Actual Behavior

All process looks like succeeded. STT model does not appear.

dosubot[bot] commented 2 days ago

Hey @tetyoka! I'm here to help you with any bugs, questions, or contributions you have for Dify. Let's get this issue sorted out together!

There are a few similar issues and discussions related to adding OpenAI-compatible speech-to-text (STT) models in Dify:

  1. No voice to text feature in my 0.8.3 version after self-host deployment [1].
  2. Help: How can the locally deployed Qwen 1.5-72b model be added to Dify [2].

A common cause for the issue where the OpenAI-compatible Speech2text model does not appear in the model list, despite successful API requests and notifications, is the failure of credentials validation. The credentials validation might be failing, which prevents the model from being listed.

Ensure that the openai_api_key provided in the credentials is correct and valid. If the credentials are invalid, the model will not be listed. Additionally, make sure that the local faster-whisper-server is properly configured and accessible.

Here is a relevant snippet from the test script that includes a validation step:

def test_validate_credentials():
    model = OpenAISpeech2TextModel()

    with pytest.raises(CredentialsValidateFailedError):
        model.validate_credentials(model="whisper-1", credentials={"openai_api_key": "invalid_key"})

    model.validate_credentials(model="whisper-1", credentials={"openai_api_key": os.environ.get("OPENAI_API_KEY")})

The OAICompatSpeech2TextModel class in the Dify codebase shows that the _invoke method is responsible for making the API call to the speech-to-text service. This method constructs the request using the provided api_key and endpoint_url from the credentials. If the endpoint_url is not correctly configured or if the API key is invalid, the request will fail, and the model will not be listed.

def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str:
    headers = {}

    api_key = credentials.get("api_key")
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"

    endpoint_url = credentials.get("endpoint_url")
    if not endpoint_url.endswith("/"):
        endpoint_url += "/"
    endpoint_url = urljoin(endpoint_url, "audio/transcriptions")

    payload = {"model": model}
    files = [("file", file)]
    response = requests.post(endpoint_url, headers=headers, data=payload, files=files)

    if response.status_code != 200:
        raise InvokeBadRequestError(response.text)
    response_data = response.json()
    return response_data["text"]

Make sure that the endpoint_url and api_key in the credentials are correctly set and that the local faster-whisper-server is running and accessible at the specified endpoint_url [3][4][5].

To continue talking to Dosu, mention @dosu.

crazywoola commented 2 days ago

I think you need to refresh the page to see the newly added model.

tetyoka commented 2 days ago

I did. The model does not appear.

crazywoola commented 2 days ago

QQ_1727361096875

Here?

tetyoka commented 2 days ago

image It says 0 model. And also. image

tetyoka commented 2 days ago

Despite my STT situation, I can add OA-compatible LLM API(oobagooba). image LLM API is running on another computer.