Feature Req: Text to Speech

MSZ-MGS commented 1 year ago

Hi there,

I am suggesting a feature to have a Text to Speech, which is basically to speech (Read Aloud) what the Llm generated.

I am imagining there will be an option to select where the speech starts whenever the response completed.

I am not sure of Gradio is supporting this, or the feasibility of the implementation. However it is good to have. I did a quick research and found these articles (I am a noob, they might not related):

Thanks!

pseudotensor commented 1 year ago

You can try out alpha version here for Speech-to-Text, will refine and add TTS too: https://github.com/h2oai/h2ogpt/pull/1089

MSZ-MGS commented 1 year ago

Hi @pseudotensor,

I was in travel and away from my PC. I have tries the TTS and the experience was great. Thank you very much for the hard work.

When I tried it in the first time, I started receiving this error: soundfile, librosa, and wavio not installed, disabling STT

So I piped all of them and the TTS worked fine, I did not see them in the requirements file. I think so far there is no way to control the speed of the speech?!.

Plz check if the below command line is fine:

python generate.py --base_model='llama' --model_path_llama=..\AddedModels\zephyr-7B-beta\zephyr-7b-beta.Q5_K_M.gguf --prompt_type=zephyr --hf_embedding_model=hkunlp/instructor-large --score_model=None --langchain_mode='UserData' --user_path=user_path --llamacpp_dict="{'n_gpu_layers':35,'n_batch':128}" --do_sample=True --top_k_docs=-1 --temperature=0.7 --repetition_penalty=1.1 --top_p=0.9 --max_seq_len=8192 --max_input_tokens=6000 --open_browser=True --tts_coquiai_deepspeed=False --enable_sst=False --chatbot_role="Female AI Assistant" --speaker="SLT (female)"

My operating system in Windows.

pseudotensor commented 1 year ago

The windows installation is a bit behind linux. I'll need to go into windows and try things out to get the install going. But if you follow the docs/readme_linux.md but do what's required in windows, it should be fine.

I added speed control in an upgrade PR:

h2oai / h2ogpt

Feature Req: Text to Speech #1088