Firefox browser cannot play TTS audio.

gjfmlj commented 3 weeks ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.7.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Opening the webapp URL with Firefox, the AI's response will not be automatically played through voice, and the response content cannot be manually played either.

✔️ Expected Behavior

I hope to play TTS audio normally in the Firefox browser.

❌ Actual Behavior

The error message in the Firefox browser console is as follows:.

Uncaught DOMException: MediaSource.addSourceBuffer: Type not supported in MediaSource NextJS 37 6343-c531cfd2e942ddac.js:1 无法播放媒体。没有相应格式的解码器：audio/mpeg

dosubot[bot] commented 3 weeks ago

Hello @gjfmlj! I'm a bot that can help you solve bugs, answer questions, and become a contributor. I'm here to assist you while waiting for a human maintainer.

The error "Uncaught DOMException: MediaSource.addSourceBuffer: Type not supported in MediaSource" in Firefox is likely due to the browser not supporting the media type of the audio being used. This can happen if the audio format generated by the TTS service is not compatible with Firefox's supported media formats.

To resolve this issue, ensure that the TTS audio is in a format supported by Firefox. Commonly supported audio formats in Firefox include MP3, Ogg Vorbis, and WAV. You may need to configure the TTS service or use a tool like FFmpeg to convert the audio to a compatible format.

Here is an example of how you can use FFmpeg to convert an audio file to MP3 format:

ffmpeg -i input_audio_file -codec:a libmp3lame output_audio_file.mp3

Replace input_audio_file with the path to your original audio file and output_audio_file.mp3 with the desired output file name. This should help ensure that the audio format is compatible with Firefox ^[1].

Additionally, the configuration file tts-1-hd.yaml specifies the default audio format used by the TTS service in Dify, which is mp3 ^[2]^[3].

To continue talking to Dosu, mention @dosu.

wellCh4n commented 3 weeks ago

This issue has nothing to do with dify, I'll try to answer it.

In dify tts api, the audio returned is in audio/mpeg(aka mp3) format. e.g. OpenAI

if len(content_text) > word_limit:
  sentences = self._split_text_into_sentences(content_text, max_length=word_limit)
  executor = concurrent.futures.ThreadPoolExecutor(max_workers=min(3, len(sentences)))
  futures = [executor.submit(client.audio.speech.with_streaming_response.create, model=model,
                             response_format="mp3",
                             input=sentences[i], voice=voice) for i in range(len(sentences))]
  for index, future in enumerate(futures):
      yield from future.result().__enter__().iter_bytes(1024)

else:
  response = client.audio.speech.with_streaming_response.create(model=model, voice=voice,
                                                                response_format="mp3",
                                                                input=content_text.strip())

On the dify page, playing audio uses MediaSource

private listenMediaSource(contentType: string) {
this.mediaSource?.addEventListener('sourceopen', () => {
if (this.sourceBuffer)
  return

this.sourceBuffer = this.mediaSource?.addSourceBuffer(contentType)
})
}

Not support audio/mpeg in Firefox MediaSource

refer: All MIME types supported by MediaRecorder in Firefox and Chrome?

Possible solution

Return other format audio. e.g. "opus", "aac", "flac", and "pcm" OpenAI Format
Switch to another way of playing audio.
Use Chrome. 🤣

langgenius / dify