mediar-ai / screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
https://screenpi.pe
MIT License
9.61k stars 559 forks source link

[feature] implement otter.ai / google meet like live caption #661

Open louis030195 opened 1 week ago

louis030195 commented 1 week ago

two paths:

  1. brute force (easy)

just open a websocket API that uses whisper or deepgram with new audio code (not based on existing architecture or more likely copy paste)

might use more resource? or maybe not

  1. somehow fit in our architecture of 24/7 audio recording (hard)

on UI side it's just another window that show up on top of your app and stream the captions

cc @EzraEllette for opinion

linear[bot] commented 1 week ago

MED-270 [feature] implement otter.ai / google meet like live caption

EzraEllette commented 1 week ago

Deepgram streaming API, or whisper tiny/faster model, short chunk duration, and SSE for the transcript. We shouldn't need Web Sockets since we can use Tuari for handling any user interactions.