tarasglek / chatcraft.org

Developer-oriented ChatGPT clone
https://chatcraft.org/
MIT License
156 stars 38 forks source link

Make some sort of a voice assistant #310

Open tarasglek opened 11 months ago

tarasglek commented 11 months ago
  1. Chatcraft provides function support https://blog.humphd.org/teaching-chatcraft-to-do-citations/ or https://www.youtube.com/watch?v=z7euXeDU5vM (using webrtc!)
  2. This means we can use chatcraft to do a lot of automation better than any google home or alexa. Eg one could comebine this with hardware and say "Chatcraft make me coffee" :)
  3. we tie it to a stable open source model to handle those requests, so we wont get dumber over time like alexa or google home.
  4. Really the only thing we need is wakeword support, which we can either do ourselves or integrate from rhasspy or wyoming protocol. It might be easier to bypass all that and run openWakeWord onnx via webgpu
  5. It would be nice to support jitsi or discord sdks so we could expose chatcraft as an assistant in meetings.

Use-cases: Meeting bot that opens and closes jiras, sends emails to summarize meetings, turns recordings on/off

tarasglek commented 11 months ago

After writing this up, I think all we need is to integrate a wakeword engine + some voice-activity detection for end of convo as a replacement for current record button. This will get us a more powerful voice assistant than any open source state of the art

tarasglek commented 11 months ago

Lol, we don't even need VAD...we can run browser speech recognition alongside audio recording. chatcraft wrote the whole thing for me https://chatcraft.org/c/tarasglek/h82v1ktY0gFSvVH9yMZB3

tarasglek commented 10 months ago

Lol, we don't even need VAD...we can run browser speech recognition alongside audio recording. chatcraft wrote the whole thing for me https://chatcraft.org/c/tarasglek/h82v1ktY0gFSvVH9yMZB3

So tts will work for interaction with phone but will not work for joining meetings. Browser tts does not work on raw audio streams...will have to use proper tts apis for those usecases

tarasglek commented 10 months ago

https://github.com/GiviMAD/rustpotter-worklet?tab=readme-ov-file is one open source option that works inbrowser. https://picovoice.ai/platform/porcupine/ is commercial option

tarasglek commented 7 months ago

So one can now use whisper-tiny in webgpu. this will probably work amazing for wakewords https://www.ratchet.sh/whisper-turbo

On my mac even the medium model is faster than real-time for processing whisper completely locally.

tarasglek commented 6 months ago

Nicely, whisper.cpp people did 99% the work for this feature: https://github.com/ggerganov/whisper.cpp/tree/master/examples/command.wasm

For now the webgpu ver is more efficient, we can just wait for whisper.cpp to start supporting that or just learn what they did to make it respond to live mic input and port it to the ratchet webgpu demo above

tarasglek commented 3 months ago

Found a good vad and otherwise good impl to borrow from https://github.com/ai-ng/swift

tarasglek commented 1 month ago

Openai did it for me https://github.com/openai/openai-realtime-console