mediar-ai / screenpipe

24/7 local AI screen & mic recording. Build AI apps that have the full context. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
https://screenpi.pe
MIT License
6.4k stars 328 forks source link

audio noise (repetitions, garbage) #38

Closed louis030195 closed 1 month ago

louis030195 commented 2 months ago

transcriptions works well for meetings atm (grasp 80% of real things discussed)

but there are often added noise to db that shouldn't be added

seems like the model's "no speech token" is not enough

related to https://github.com/louis030195/screen-pipe/issues/30

louis030195 commented 2 months ago

simple hack could be to have a list of shits we noticed whisper output and do a if like when it's outputting weird japanese stuff or things like:

[2024-07-12T08:34:08Z INFO  screenpipe_audio::stt]   0.0s-0.0s:
[2024-07-12T08:34:08Z INFO  screenpipe_audio::stt]   0.0s-...:  1.5cm x 3.5cm x 3.5cm x 3.5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm x 5cm
[2024-07-12T08:34:08Z INFO  screenpipe_audio::stt] 30.0s -- 60.0s
[2024-07-12T08:34:08Z INFO  screenpipe_audio::stt]   0.0s-0.0s:

one idea of mine also is to use a small LLM (SLM?) to filter more broadly this garbage - i think need to do some tests and making sure we:

  1. dont use too much computing
  2. dont turn the codebase into a mess of models (think more "unix pipes")
louis030195 commented 2 months ago
music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music music
chrisperfer commented 2 months ago

I got a zillion: I have to go to the hospital I have to go to the hospital I have to go to the hospital... Gave me chills.

louis030195 commented 2 months ago

haha

if you build stuff on top of screenpipe you can just do some filtering with an LLM like

llm "remove shit noise from audio" llm "based on this data .... answer user question"