Open tarasglek opened 11 months ago
After writing this up, I think all we need is to integrate a wakeword engine + some voice-activity detection for end of convo as a replacement for current record button. This will get us a more powerful voice assistant than any open source state of the art
Lol, we don't even need VAD...we can run browser speech recognition alongside audio recording. chatcraft wrote the whole thing for me https://chatcraft.org/c/tarasglek/h82v1ktY0gFSvVH9yMZB3
Lol, we don't even need VAD...we can run browser speech recognition alongside audio recording. chatcraft wrote the whole thing for me https://chatcraft.org/c/tarasglek/h82v1ktY0gFSvVH9yMZB3
So tts will work for interaction with phone but will not work for joining meetings. Browser tts does not work on raw audio streams...will have to use proper tts apis for those usecases
https://github.com/GiviMAD/rustpotter-worklet?tab=readme-ov-file is one open source option that works inbrowser. https://picovoice.ai/platform/porcupine/ is commercial option
So one can now use whisper-tiny in webgpu. this will probably work amazing for wakewords https://www.ratchet.sh/whisper-turbo
On my mac even the medium model is faster than real-time for processing whisper completely locally.
Nicely, whisper.cpp people did 99% the work for this feature: https://github.com/ggerganov/whisper.cpp/tree/master/examples/command.wasm
For now the webgpu ver is more efficient, we can just wait for whisper.cpp to start supporting that or just learn what they did to make it respond to live mic input and port it to the ratchet webgpu demo above
Found a good vad and otherwise good impl to borrow from https://github.com/ai-ng/swift
Openai did it for me https://github.com/openai/openai-realtime-console
Use-cases: Meeting bot that opens and closes jiras, sends emails to summarize meetings, turns recordings on/off