tarasglek / chatcraft.org

Developer-oriented ChatGPT clone
https://chatcraft.org/
MIT License
152 stars 27 forks source link

Make some sort of a voice assistant #310

Open tarasglek opened 8 months ago

tarasglek commented 8 months ago
  1. Chatcraft provides function support https://blog.humphd.org/teaching-chatcraft-to-do-citations/ or https://www.youtube.com/watch?v=z7euXeDU5vM (using webrtc!)
  2. This means we can use chatcraft to do a lot of automation better than any google home or alexa. Eg one could comebine this with hardware and say "Chatcraft make me coffee" :)
  3. we tie it to a stable open source model to handle those requests, so we wont get dumber over time like alexa or google home.
  4. Really the only thing we need is wakeword support, which we can either do ourselves or integrate from rhasspy or wyoming protocol. It might be easier to bypass all that and run openWakeWord onnx via webgpu
  5. It would be nice to support jitsi or discord sdks so we could expose chatcraft as an assistant in meetings.

Use-cases: Meeting bot that opens and closes jiras, sends emails to summarize meetings, turns recordings on/off

tarasglek commented 8 months ago

After writing this up, I think all we need is to integrate a wakeword engine + some voice-activity detection for end of convo as a replacement for current record button. This will get us a more powerful voice assistant than any open source state of the art

tarasglek commented 8 months ago

Lol, we don't even need VAD...we can run browser speech recognition alongside audio recording. chatcraft wrote the whole thing for me https://chatcraft.org/c/tarasglek/h82v1ktY0gFSvVH9yMZB3

tarasglek commented 8 months ago

Lol, we don't even need VAD...we can run browser speech recognition alongside audio recording. chatcraft wrote the whole thing for me https://chatcraft.org/c/tarasglek/h82v1ktY0gFSvVH9yMZB3

So tts will work for interaction with phone but will not work for joining meetings. Browser tts does not work on raw audio streams...will have to use proper tts apis for those usecases

tarasglek commented 8 months ago

https://github.com/GiviMAD/rustpotter-worklet?tab=readme-ov-file is one open source option that works inbrowser. https://picovoice.ai/platform/porcupine/ is commercial option

tarasglek commented 4 months ago

So one can now use whisper-tiny in webgpu. this will probably work amazing for wakewords https://www.ratchet.sh/whisper-turbo

On my mac even the medium model is faster than real-time for processing whisper completely locally.

tarasglek commented 4 months ago

Nicely, whisper.cpp people did 99% the work for this feature: https://github.com/ggerganov/whisper.cpp/tree/master/examples/command.wasm

For now the webgpu ver is more efficient, we can just wait for whisper.cpp to start supporting that or just learn what they did to make it respond to live mic input and port it to the ratchet webgpu demo above

tarasglek commented 4 weeks ago

Found a good vad and otherwise good impl to borrow from https://github.com/ai-ng/swift