h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.29k stars 1.24k forks source link

TTS STT integration #775

Closed pseudotensor closed 8 months ago

pseudotensor commented 1 year ago

https://github.com/suno-ai/bark

SpeechT5: https://huggingface.co/microsoft/speecht5_tts https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ

https://github.com/AIGC-Audio/AudioGPT

Gradio ASR: https://colab.research.google.com/drive/1D38wbK6v65V2BndsvhEms882BgzK1aMQ?usp=sharing https://data-dive.com/realtime-audio-stream-keyword-monitoring-and-alerting-using-openai-whisper/ https://huggingface.co/openai/whisper-medium https://huggingface.co/learn/audio-course/chapter2/asr_pipeline https://www.gradio.app/guides/real-time-speech-recognition https://discuss.huggingface.co/t/how-to-get-the-microphone-streaming-input-file-when-using-blocks/37204/2 https://www.gradio.app/guides/real-time-speech-recognition https://github.com/gradio-app/gradio/issues/1349 https://www.gradio.app/docs/audio https://huggingface.co/spaces/balacoon/tts https://www.youtube.com/watch?v=jG52ot4njNs

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

pseudotensor commented 10 months ago

github TTS - Google Search https://www.google.com/search?q=github+TTS&oq=github+TTS&aqs=chrome..69i57j0i512l2j0i22i30l4j69i60.1751j0j4&sourceid=chrome&ie=UTF-8

TTS/notebooks/Tortoise.ipynb at dev 路 coqui-ai/TTS https://github.com/coqui-ai/TTS/blob/dev/notebooks/Tortoise.ipynb

neonbjb/tortoise-tts: A multi-voice TTS system trained with an emphasis on quality https://github.com/neonbjb/tortoise-tts

suno-ai/bark: 馃攰 Text-Prompted Generative Audio Model https://github.com/suno-ai/bark

Home http://localhost:8888/tree/notebooks

Tortoise http://localhost:8888/notebooks/notebooks/Tortoise.ipynb

Home http://localhost:8888/tree

Tortoise http://localhost:8888/notebooks/Tortoise.ipynb

Real Time Speech Recognition https://www.gradio.app/guides/real-time-speech-recognition

Add option for autoplay in Audio component 路 Issue #1349 路 gradio-app/gradio https://github.com/gradio-app/gradio/issues/1349

Add audio file to HTML audio player for autoplay functionality by tszumowski 路 Pull Request #8 路 tszumowski/vocaltales_storyteller_chatbot https://github.com/tszumowski/vocaltales_storyteller_chatbot/pull/8

Gradio Audio Docs https://www.gradio.app/docs/audio

Text-to-Speech - a Hugging Face Space by balacoon https://huggingface.co/spaces/balacoon/tts

Real-Time Live Speech-to-Text | Streaming ASR Gradio App with Hugging Face Tutorial - YouTube https://www.youtube.com/watch?v=jG52ot4njNs

RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Voice data <= 10 mins can also be used to train a good VC model! https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

How to get the microphone streaming input file when using blocks? - Gradio - Hugging Face Forums https://discuss.huggingface.co/t/how-to-get-the-microphone-streaming-input-file-when-using-blocks/37204/2

Real Time Speech Recognition https://www.gradio.app/guides/real-time-speech-recognition

Automatic speech recognition with a pipeline - Hugging Face Audio Course https://huggingface.co/learn/audio-course/chapter2/asr_pipeline

Near real-time transcription of a live audio stream with OpenAI Whisper for keyword monitoring - Data-Dive https://data-dive.com/realtime-audio-stream-keyword-monitoring-and-alerting-using-openai-whisper/

pseudotensor commented 10 months ago

https://discuss.huggingface.co/t/use-start-stop-button-to-record-live-audio-using-gradio-app/53313

https://discuss.huggingface.co/t/how-to-get-the-microphone-streaming-input-file-when-using-blocks/37204/2

https://huggingface.co/spaces/aadnk/faster-whisper-webui

https://www.linkedin.com/pulse/create-talking-bot-new-chatgpt-whisper-api-using-python-leo-wang/

pseudotensor commented 8 months ago

done