Feature Request: A few suggestions to enhance speechgpt user experience

erbanku commented 1 year ago

My feature request is related to several problems I am experiencing while using the current version of the speechgpt. I am frustrated when:

The keyboard remains visible even after completing my input, which takes up unnecessary screen space and makes it harder to read the chat.
The keyboard still shows up while I interact with the assistant using speech recognition, which is unnecessary in that scenario and can be distracting.
Many average users need clarification on setting the speech recognition/synthesis language and language ID. So, I prefer an easier way to do this through environment variables and let the average users use it more easily with default configurations.
When the assistant generates a lengthy response, I have to wait for the honest answer to be developed before I can listen or read it. Streaming output for both text and TTS would make this process smoother and more enjoyable.
I often want to replay the assistant's response or my input via TTS but cannot curate more so, which can be inconvenient when I need to review previous interactions.

Hide the keyboard after the user completes input and show back again after ChatGPT completes the response. This repo: ddiu8081/chatgpt-demo achieved this well. You can look around it if you like.
Do not show the keyboard when the user interacts with the assistant via speech recognition
Ability to set default speech recognition/synthesis language & language ID via environment variables. (As many average users find setting these at first a few confusing)
Assistant response streaming output, if it is possible, + streaming TTS output (This is very helpful when the assistant generates a long response)
Ability to replay the assistant response or the other input via the TTS engine

No response

hahahumble commented 1 year ago

This is a great suggestion.
My plan is to add an option that allows users to choose whether to display the keyboard during speech recognition, as speech recognition may produce errors, and displaying the keyboard would allow users to quickly correct mistakes.
Different services have different supported languages and voices, so using environment variables for configuration might be complicated.
Currently, I have not found any TTS API that supports streaming. A possible solution is to split the assistant's responses into multiple sentences and send multiple requests.
This feature will be supported in future updates.

Thank you very much for your suggestions.

Misaka-9982-coder commented 1 year ago

Perhaps these two bots can bring some inspiration. Samantha: https://t.me/samantha_x64_bot Sherlock: https://t.me/sherlock_myshell_ai_bot

hahahumble commented 1 year ago

Suggestions 1, 2, and 5 have been resolved

hahahumble / speechgpt