feat: Volume, Speech Rate, and Pitch Controls for Text-to-Speech (TTS) Output

silentoplayz commented 3 months ago

Problem Description: The current version of Open WebUI lacks the necessary customization options for the text-to-speech (TTS) output, including volume control, speech rate adjustment, pitch adjustment, and audio playback functionality for speaking out notifications. These limitations hinder the user experience and accessibility of the text-to-speech (TTS) feature.

Describe the solution you'd like: I propose the implementation of the following features to enhance the TTS output customization:

A volume control slider to adjust the volume of the TTS output.
A "Speech Rate" slider to adjust the speed of the TTS output.
A "Pitch" slider enabling users to modify the voice pitch of the TTS output.
An option to enable or disable audio playback for speaking out notifications.

Alternative solution: Offer predefined volume, speed, & pitch options instead of a slider for a simpler interface.

Alternatives Considered: Manually adjusting the device's overall volume or utilizing third-party applications to manipulate speech output and volume settings represents a workaround. However, this solution proves to be inconvenient for users, necessitating the addition of these much-needed features within Open WebUI.

Additional Context: This feature request focuses on improving the text-to-speech (TTS) feature's accessibility and overall user experience. Implementing these requested features, including volume, speed, and pitch adjustments, will significantly enhance user satisfaction and convenience. It's crucial to maintain compatibility with existing features, ensuring this customization suite does not adversely impact any existing functionalities or behaviors.

dannyl1u commented 2 months ago

I think this is would be a good feature, how does this look for the UI?

silentoplayz commented 2 months ago

That looks good to me @dannyl1u, although, do you think the sliders could take on a similar form as the model advanced parameter sliders? I only ask because I feel that tjbck would step in to ask the same thing eventually or even make the adjustment himself.

Screenshot 2024-03-16 141517

P.S: Thank you for your contributions to Open WebUI!

dannyl1u commented 2 months ago

That looks good to me @dannyl1u, although, do you think the sliders could take on a similar form as the model advanced parameter sliders? I only ask because I feel that tjbck would step in to ask the same thing eventually or even make the adjustment themselves.

P.S: Thank you for your contributions to Open WebUI!

Yes! Thanks for the suggestion, I forgot those sliders existed 😆 , that's definitely the better UI and I'll reuse that!

UXVirtual commented 2 months ago

@dannyl1u another challenge with TTS output I've noticed is generated markdown code blocks are spoken out audibly.

Making this a toggle option, and stripping the code block prior to the extractSentences() call if it is toggled on would help with coding assistant use-cases.

littledot2020 commented 1 month ago

@dannyl1u我注意到的 TTS 输出的另一个挑战是生成的 markdown 代码块是以声音形式读出的。

将其设为切换选项，并在extractSentences()切换后剥离调用之前的代码块，这将有助于编码助手用例。 I also want to know how to play content formatted after converting markdown.

open-webui / open-webui

feat: Volume, Speech Rate, and Pitch Controls for Text-to-Speech (TTS) Output #1331