langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
45.48k stars 6.39k forks source link

Firefox browser cannot play TTS audio. #7492

Open gjfmlj opened 3 weeks ago

gjfmlj commented 3 weeks ago

Self Checks

Dify version

0.7.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Opening the webapp URL with Firefox, the AI's response will not be automatically played through voice, and the response content cannot be manually played either.

✔️ Expected Behavior

I hope to play TTS audio normally in the Firefox browser.

❌ Actual Behavior

The error message in the Firefox browser console is as follows:.

Uncaught DOMException: MediaSource.addSourceBuffer: Type not supported in MediaSource NextJS 37 6343-c531cfd2e942ddac.js:1 无法播放媒体。没有相应格式的解码器:audio/mpeg

dosubot[bot] commented 3 weeks ago

Hello @gjfmlj! I'm a bot that can help you solve bugs, answer questions, and become a contributor. I'm here to assist you while waiting for a human maintainer.

The error "Uncaught DOMException: MediaSource.addSourceBuffer: Type not supported in MediaSource" in Firefox is likely due to the browser not supporting the media type of the audio being used. This can happen if the audio format generated by the TTS service is not compatible with Firefox's supported media formats.

To resolve this issue, ensure that the TTS audio is in a format supported by Firefox. Commonly supported audio formats in Firefox include MP3, Ogg Vorbis, and WAV. You may need to configure the TTS service or use a tool like FFmpeg to convert the audio to a compatible format.

Here is an example of how you can use FFmpeg to convert an audio file to MP3 format:

ffmpeg -i input_audio_file -codec:a libmp3lame output_audio_file.mp3

Replace input_audio_file with the path to your original audio file and output_audio_file.mp3 with the desired output file name. This should help ensure that the audio format is compatible with Firefox [1].

Additionally, the configuration file tts-1-hd.yaml specifies the default audio format used by the TTS service in Dify, which is mp3 [2][3].

To continue talking to Dosu, mention @dosu.

wellCh4n commented 3 weeks ago

This issue has nothing to do with dify, I'll try to answer it.

  1. In dify tts api, the audio returned is in audio/mpeg(aka mp3) format. e.g. OpenAI

    if len(content_text) > word_limit:
      sentences = self._split_text_into_sentences(content_text, max_length=word_limit)
      executor = concurrent.futures.ThreadPoolExecutor(max_workers=min(3, len(sentences)))
      futures = [executor.submit(client.audio.speech.with_streaming_response.create, model=model,
                                 response_format="mp3",
                                 input=sentences[i], voice=voice) for i in range(len(sentences))]
      for index, future in enumerate(futures):
          yield from future.result().__enter__().iter_bytes(1024)
    
    else:
      response = client.audio.speech.with_streaming_response.create(model=model, voice=voice,
                                                                    response_format="mp3",
                                                                    input=content_text.strip())
  2. On the dify page, playing audio uses MediaSource

    private listenMediaSource(contentType: string) {
    this.mediaSource?.addEventListener('sourceopen', () => {
    if (this.sourceBuffer)
      return
    
    this.sourceBuffer = this.mediaSource?.addSourceBuffer(contentType)
    })
    }
  3. Not support audio/mpeg in Firefox MediaSource image

refer: All MIME types supported by MediaRecorder in Firefox and Chrome?

Possible solution

  1. Return other format audio. e.g. "opus", "aac", "flac", and "pcm" OpenAI Format
  2. Switch to another way of playing audio.
  3. Use Chrome. 🤣