mediar-ai / screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
https://screenpi.pe
MIT License
8.92k stars 511 forks source link

[bounty] impl STT streaming #521

Closed louis030195 closed 2 weeks ago

louis030195 commented 1 month ago

previous context: #431 #306 #374

/bounty 200

@EzraEllette

ideally this solves:

linear[bot] commented 1 month ago

MED-208 [bounty] impl STT streaming

algora-pbc[bot] commented 1 month ago

πŸ’Ž $200 bounty β€’ Screenpi.pe

## πŸ’Ž $100 bounty β€’ Screenpi.pe

Steps to solve:

  1. Start working: Comment /attempt #521 with your implementation plan
  2. Submit work: Create a pull request including /claim #521 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bounty β€’ Share on socials

Attempt Started (GMT+0) Solution
🟒 @EzraEllette Oct 17, 2024, 6:05:47 PM #521
EzraEllette commented 1 month ago

/attempt #521

πŸ‘πŸΌ I'm going to spend another hour on the screenpipe shortcut, then come to this. I spent some time planning this out yesterday, so we'll see how far it gets today.

Algora profile Completed bounties Tech Active attempts Options
@EzraEllette 5 mediar-ai bounties
Rust, TypeScript,
JavaScript & more
﹟451, ﹟513
Cancel attempt
louis030195 commented 1 month ago
Screenshot 2024-10-17 at 16 26 12

weird stuff happening on my side

bunch of speech frames on audio output (nothing played)

EzraEllette commented 1 month ago

Can you send the audio clip from this screenshot?

louis030195 commented 1 month ago

i did not play any audio, currently facing 3 issues:

  • transcriptions does not work at all (for me and another user)
  • file encoding stopped working (for me and another user) -> not sure about this one
  • i can't run screenpipe without dev mode in the app ("Read-only file system" seems to be only on my end) Screenshot 2024-10-17 at 16 39 42

update: matt also has the read only issue not sure what's happening

louis030195 commented 1 month ago

on main:

(env) (base) louisbeaumont@louisbeaumontme-macbook:~/Documents/screen-pipe$ ./target/release/screenpipe
2024-10-18T00:07:47.117161Z  INFO screenpipe: logging initialized
2024-10-18T00:07:47.506784Z  INFO screenpipe:   MacBook Pro Microphone (input)
2024-10-18T00:07:47.506843Z  INFO screenpipe:   Display 1 (output)
2024-10-18T00:07:47.509740Z  INFO screenpipe_server::db: Migrations executed successfully.    
2024-10-18T00:07:47.509752Z  INFO screenpipe: database initialized, will store files in /Users/louisbeaumont/.screenpipe

                                            _          
   __________________  ___  ____     ____  (_____  ___ 
  / ___/ ___/ ___/ _ \/ _ \/ __ \   / __ \/ / __ \/ _ \
 (__  / /__/ /  /  __/  __/ / / /  / /_/ / / /_/ /  __/
/____/\___/_/   \___/\___/_/ /_/  / .___/_/ .___/\___/ 
                                 /_/     /_/           

build ai apps that have the full context
open source | runs locally | developer friendly

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ setting             β”‚ value                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ fps                 β”‚ 0.2                                β”‚
β”‚ audio chunk durationβ”‚ 30 seconds                         β”‚
β”‚ video chunk durationβ”‚ 60 seconds                         β”‚
β”‚ port                β”‚ 3030                               β”‚
β”‚ audio disabled      β”‚ false                              β”‚
β”‚ vision disabled     β”‚ false                              β”‚
β”‚ save text files     β”‚ false                              β”‚
β”‚ audio engine        β”‚ WhisperLargeV3Turbo                β”‚
β”‚ ocr engine          β”‚ AppleNative                        β”‚
β”‚ vad engine          β”‚ Silero                             β”‚
β”‚ vad sensitivity     β”‚ High                               β”‚
β”‚ data directory      β”‚ /Users/louisbeaumont/.screenpipe   β”‚
β”‚ debug mode          β”‚ false                              β”‚
β”‚ telemetry           β”‚ true                               β”‚
β”‚ local llm           β”‚ false                              β”‚
β”‚ use pii removal     β”‚ false                              β”‚
β”‚ ignored windows     β”‚ []                                 β”‚
β”‚ included windows    β”‚ []                                 β”‚
β”‚ friend wearable uid β”‚ not set                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ languages           β”‚                                    β”‚
β”‚                     β”‚ all languages                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ monitors            β”‚                                    β”‚
β”‚                     β”‚ id: 1                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ audio devices       β”‚                                    β”‚
β”‚                     β”‚ MacBook Pro Microphone (input)     β”‚
β”‚                     β”‚ Display 1 (output)                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ pipes               β”‚                                    β”‚
β”‚                     β”‚ (disabled) pipe-llama32-comment... β”‚
β”‚                     β”‚ (disabled) pipe-screen-time-sto... β”‚
β”‚                     β”‚ (disabled) pipe-email-exa-search   β”‚
β”‚                     β”‚ (disabled) pipe-phi3-5-engineer... β”‚
β”‚                     β”‚ (disabled) pipe-meeting-summary... β”‚
β”‚                     β”‚ ... and 3 more                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
you are using local processing. all your data stays on your computer.

warning: telemetry is enabled. only error-level data will be sent to highlight.io.
to disable, use the --disable-telemetry flag.
2024-10-18T00:07:47.513256Z  INFO screenpipe_server::server: Server starting on 127.0.0.1:3030    
2024-10-18T00:07:47.517022Z  INFO screenpipe_audio::whisper: device = Metal(MetalDevice(DeviceId(1)))    
2024-10-18T00:07:51.931752Z  INFO screenpipe_audio::vad_engine: Initializing SileroVad...
2024-10-18T00:07:51.931809Z  INFO screenpipe_audio::vad_engine: SileroVad Model downloaded to: "/Users/louisbeaumont/Library/Caches/screenpipe/vad/silero_vad.onnx"
2024-10-18T00:07:51.952655Z  INFO screenpipe_server::video: Starting new video capture    
2024-10-18T00:07:51.952686Z  INFO screenpipe_server::video: Started capture thread    
2024-10-18T00:07:53.048206Z  INFO screenpipe_server::video: Starting FFmpeg process for file: /Users/louisbeaumont/.screenpipe/data/monitor_1_2024-10-18_00-07-53.mp4    
2024-10-18T00:07:57.684565Z  INFO screenpipe_server::resource_monitor: Runtime: 10s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 0%, NPU: N/A
2024-10-18T00:08:02.548318Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:08:02.548319Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:08:02.591900Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:08:02.659273Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:08:07.797626Z  INFO screenpipe_server::resource_monitor: Runtime: 20s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 23%, NPU: N/A
2024-10-18T00:08:17.894315Z  INFO screenpipe_server::resource_monitor: Runtime: 30s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 26%, NPU: N/A
2024-10-18T00:08:27.968788Z  INFO screenpipe_server::resource_monitor: Runtime: 40s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 16%, NPU: N/A
2024-10-18T00:08:30.550787Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:08:30.550788Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:08:30.587664Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:08:30.676379Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:08:32.683475Z  INFO screenpipe_server::core: Finished record_and_transcribe for device Display 1 (output) (iteration 1)    
2024-10-18T00:08:32.683503Z  INFO screenpipe_server::core: Recording complete for device Display 1 (output) (iteration 1): ()    
2024-10-18T00:08:32.683509Z  INFO screenpipe_server::core: Finished iteration 1 for device Display 1 (output)    
2024-10-18T00:08:32.687905Z  INFO screenpipe_audio::stt: device: Display 1 (output), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:08:32.720367Z  INFO screenpipe_server::core: Finished record_and_transcribe for device MacBook Pro Microphone (input) (iteration 1)    
2024-10-18T00:08:32.720382Z  INFO screenpipe_server::core: Recording complete for device MacBook Pro Microphone (input) (iteration 1): ()    
2024-10-18T00:08:32.720385Z  INFO screenpipe_server::core: Finished iteration 1 for device MacBook Pro Microphone (input)    
2024-10-18T00:08:32.819238Z  INFO screenpipe_audio::stt: device: Display 1 (output), total audio frames processed: 598, frames that include speech: 409, speech duration: 40900ms, speech ratio: 0.68, min required ratio: 0.20    
2024-10-18T00:08:33.501830Z  INFO screenpipe_audio::multilingual: detected language: "en"    
2024-10-18T00:08:35.925761Z  INFO screenpipe_server::video: Starting FFmpeg process for file: /Users/louisbeaumont/.screenpipe/data/monitor_1_2024-10-18_00-08-35.mp4    
2024-10-18T00:08:38.025084Z  INFO screenpipe_server::resource_monitor: Runtime: 50s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 42%, NPU: N/A
2024-10-18T00:08:41.759461Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:08:41.759482Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:08:41.759512Z  INFO screenpipe_audio::whisper:   0.0s-...:  I am a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a    
2024-10-18T00:08:48.084594Z  INFO screenpipe_server::resource_monitor: Runtime: 60s, Total Memory: 1% (1 GB / 37 GB), Total CPU: 41%, NPU: N/A
2024-10-18T00:08:49.927016Z  INFO screenpipe_audio::whisper: 30.0s -- 60.0s    
2024-10-18T00:08:49.927035Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:08:49.927064Z  INFO screenpipe_audio::whisper:   0.0s-...:  I am a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a    
2024-10-18T00:08:50.282551Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:08:50.340787Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), total audio frames processed: 300, frames that include speech: 35, speech duration: 3500ms, speech ratio: 0.12, min required ratio: 0.20    
2024-10-18T00:08:50.355553Z  INFO screenpipe_server::core: device Display 1 (output) received transcription Some(" I am a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a\n")    
2024-10-18T00:08:50.355618Z  INFO screenpipe_server::core: device Display 1 (output) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/Display 1 (output)_2024-10-18_00-08-49.mp4"    
2024-10-18T00:08:50.357890Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) received transcription Some("")    
2024-10-18T00:08:50.357907Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) inserting audio chunk: ""    
2024-10-18T00:08:58.157661Z  INFO screenpipe_server::resource_monitor: Runtime: 70s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 26%, NPU: N/A
2024-10-18T00:08:58.551730Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:08:58.551744Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:08:58.584655Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:08:58.669722Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:09:00.637851Z  INFO screenpipe_server::core: Finished record_and_transcribe for device Display 1 (output) (iteration 2)    
2024-10-18T00:09:00.637873Z  INFO screenpipe_server::core: Recording complete for device Display 1 (output) (iteration 2): ()    
2024-10-18T00:09:00.637877Z  INFO screenpipe_server::core: Finished iteration 2 for device Display 1 (output)    
2024-10-18T00:09:00.640354Z  INFO screenpipe_audio::stt: device: Display 1 (output), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:00.683519Z  INFO screenpipe_server::core: Finished record_and_transcribe for device MacBook Pro Microphone (input) (iteration 2)    
2024-10-18T00:09:00.683537Z  INFO screenpipe_server::core: Recording complete for device MacBook Pro Microphone (input) (iteration 2): ()    
2024-10-18T00:09:00.683540Z  INFO screenpipe_server::core: Finished iteration 2 for device MacBook Pro Microphone (input)    
2024-10-18T00:09:00.765674Z  INFO screenpipe_audio::stt: device: Display 1 (output), total audio frames processed: 598, frames that include speech: 170, speech duration: 17000ms, speech ratio: 0.28, min required ratio: 0.20    
2024-10-18T00:09:01.435215Z  INFO screenpipe_audio::multilingual: detected language: "ru"    
2024-10-18T00:09:03.698518Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:09:03.698542Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:03.698554Z  INFO screenpipe_audio::whisper:   0.0s-17.0s:  Π½Π΅ Π·Π°Π±ΡƒΠ΄ΡŒΡ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅.    
2024-10-18T00:09:04.474279Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:09:04.474306Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:04.474312Z  INFO screenpipe_audio::whisper:   0.0s-30.0s:  ΠŸΡ€ΠΎΠ΄ΠΎΠ»ΠΆΠ΅Π½ΠΈΠ΅ слСдуСт...    
2024-10-18T00:09:04.762738Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:04.782686Z  INFO screenpipe_server::core: device Display 1 (output) received transcription Some(" Π½Π΅ Π·Π°Π±ΡƒΠ΄ΡŒΡ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅, Ρ‡Ρ‚ΠΎ Π²Ρ‹ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚Π΅.\n")    
2024-10-18T00:09:04.782742Z  INFO screenpipe_server::core: device Display 1 (output) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/Display 1 (output)_2024-10-18_00-09-04.mp4"    
2024-10-18T00:09:04.824993Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), total audio frames processed: 300, frames that include speech: 191, speech duration: 19100ms, speech ratio: 0.64, min required ratio: 0.20    
2024-10-18T00:09:05.481530Z  INFO screenpipe_audio::multilingual: detected language: "en"    
2024-10-18T00:09:07.914646Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:09:07.914669Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:07.914678Z  INFO screenpipe_audio::whisper:   0.0s-7.0s:  long night of racial injustice. I accept this award on behalf of a civil rights movement    
2024-10-18T00:09:07.914684Z  INFO screenpipe_audio::whisper:   7.0s-16.0s:  which is moving with determination and a majestic scorn for risk and danger to establish a reign    
2024-10-18T00:09:07.914688Z  INFO screenpipe_audio::whisper:   16.0s-19.0s:  of freedom and a rule of justice.    
2024-10-18T00:09:08.214373Z  INFO screenpipe_server::resource_monitor: Runtime: 80s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 50%, NPU: N/A
2024-10-18T00:09:08.667949Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:09:08.667966Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:08.667971Z  INFO screenpipe_audio::whisper:   0.0s-30.0s:  So, let's go.    
2024-10-18T00:09:08.952238Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) received transcription Some(" long night of racial injustice. I accept this award on behalf of a civil rights movement which is moving with determination and a majestic scorn for risk and danger to establish a reign of freedom and a rule of justice.\n")    
2024-10-18T00:09:08.952291Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/MacBook Pro Microphone (input)_2024-10-18_00-09-08.mp4"    
2024-10-18T00:09:18.286262Z  INFO screenpipe_server::resource_monitor: Runtime: 90s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 17%, NPU: N/A
2024-10-18T00:09:20.324354Z  INFO screenpipe_server::video: Starting FFmpeg process for file: /Users/louisbeaumont/.screenpipe/data/monitor_1_2024-10-18_00-09-20.mp4    
2024-10-18T00:09:26.554054Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:09:26.554068Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:09:26.576186Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:09:26.658205Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:09:28.375013Z  INFO screenpipe_server::resource_monitor: Runtime: 100s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 32%, NPU: N/A
2024-10-18T00:09:28.644341Z  INFO screenpipe_server::core: Finished record_and_transcribe for device Display 1 (output) (iteration 3)    
2024-10-18T00:09:28.645648Z  INFO screenpipe_server::core: Recording complete for device Display 1 (output) (iteration 3): ()    
2024-10-18T00:09:28.645698Z  INFO screenpipe_server::core: Finished iteration 3 for device Display 1 (output)    
2024-10-18T00:09:28.651222Z  INFO screenpipe_audio::stt: device: Display 1 (output), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:28.730917Z  INFO screenpipe_server::core: Finished record_and_transcribe for device MacBook Pro Microphone (input) (iteration 3)    
2024-10-18T00:09:28.730936Z  INFO screenpipe_server::core: Recording complete for device MacBook Pro Microphone (input) (iteration 3): ()    
2024-10-18T00:09:28.730939Z  INFO screenpipe_server::core: Finished iteration 3 for device MacBook Pro Microphone (input)    
2024-10-18T00:09:28.787250Z  INFO screenpipe_audio::stt: device: Display 1 (output), total audio frames processed: 599, frames that include speech: 262, speech duration: 26200ms, speech ratio: 0.44, min required ratio: 0.20    
2024-10-18T00:09:29.452026Z  INFO screenpipe_audio::multilingual: detected language: "fr"    
2024-10-18T00:09:37.575058Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:09:37.575078Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:37.575108Z  INFO screenpipe_audio::whisper:   0.0s-...:  Je ne suis pas de la mort, mais je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la    
2024-10-18T00:09:38.268747Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:09:38.268771Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:38.268775Z  INFO screenpipe_audio::whisper:   0.0s-29.0s:  ...    
2024-10-18T00:09:38.434700Z  INFO screenpipe_server::resource_monitor: Runtime: 110s, Total Memory: 2% (1 GB / 37 GB), Total CPU: 37%, NPU: N/A
2024-10-18T00:09:38.565082Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:38.610329Z  INFO screenpipe_server::core: device Display 1 (output) received transcription Some(" Je ne suis pas de la mort, mais je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la mort. Je ne suis pas de la\n")    
2024-10-18T00:09:38.610393Z  INFO screenpipe_server::core: device Display 1 (output) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/Display 1 (output)_2024-10-18_00-09-38.mp4"    
2024-10-18T00:09:38.630196Z  INFO screenpipe_audio::stt: device: MacBook Pro Microphone (input), total audio frames processed: 300, frames that include speech: 191, speech duration: 19100ms, speech ratio: 0.64, min required ratio: 0.20    
2024-10-18T00:09:39.289017Z  INFO screenpipe_audio::multilingual: detected language: "en"    
2024-10-18T00:09:41.772842Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:09:41.772863Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:41.772877Z  INFO screenpipe_audio::whisper:   0.0s-19.2s:  I am mindful that only yesterday in Birmingham, Alabama, our children crying out for brotherhood, answered with fire hoses, snarling dogs and even death. I am mindful that only yesterday in Philadelphia, Mississippi, young people seeking help.    
2024-10-18T00:09:42.525594Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:09:42.525615Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:09:42.525620Z  INFO screenpipe_audio::whisper:   0.0s-30.0s:  So, let's go.    
2024-10-18T00:09:42.773644Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) received transcription Some(" I am mindful that only yesterday in Birmingham, Alabama, our children crying out for brotherhood, answered with fire hoses, snarling dogs and even death. I am mindful that only yesterday in Philadelphia, Mississippi, young people seeking help.\n")    
2024-10-18T00:09:42.773716Z  INFO screenpipe_server::core: device MacBook Pro Microphone (input) inserting audio chunk: "/Users/louisbeaumont/.screenpipe/data/MacBook Pro Microphone (input)_2024-10-18_00-09-42.mp4"    
2024-10-18T00:09:48.513890Z  INFO screenpipe_server::resource_monitor: Runtime: 120s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 37%, NPU: N/A
2024-10-18T00:09:54.555273Z  INFO screenpipe_audio::core: device: "Display 1 (output)"    
2024-10-18T00:09:54.555278Z  INFO screenpipe_audio::core: device: "MacBook Pro Microphone (input)"    
2024-10-18T00:09:54.598362Z  INFO screenpipe_audio::core: Recording Display 1 (output) for 30 seconds    
2024-10-18T00:09:54.685662Z  INFO screenpipe_audio::core: Recording MacBook Pro Microphone (input) for 30 seconds    
2024-10-18T00:09:56.633074Z  INFO screenpipe_server::core: Finished record_and_transcribe for device Display 1 (output) (iteration 4)    
2024-10-18T00:09:56.633099Z  INFO screenpipe_server::core: Recording complete for device Display 1 (output) (iteration 4): ()    
2024-10-18T00:09:56.633103Z  INFO screenpipe_server::core: Finished iteration 4 for device Display 1 (output)    
2024-10-18T00:09:56.636789Z  INFO screenpipe_audio::stt: device: Display 1 (output), resampling from 48000 Hz to 16000 Hz    
2024-10-18T00:09:56.707473Z  INFO screenpipe_server::core: Finished record_and_transcribe for device MacBook Pro Microphone (input) (iteration 4)    
2024-10-18T00:09:56.707491Z  INFO screenpipe_server::core: Recording complete for device MacBook Pro Microphone (input) (iteration 4): ()    
2024-10-18T00:09:56.707494Z  INFO screenpipe_server::core: Finished iteration 4 for device MacBook Pro Microphone (input)    
2024-10-18T00:09:56.763168Z  INFO screenpipe_audio::stt: device: Display 1 (output), total audio frames processed: 599, frames that include speech: 202, speech duration: 20200ms, speech ratio: 0.34, min required ratio: 0.20    
2024-10-18T00:09:57.421515Z  INFO screenpipe_audio::multilingual: detected language: "en"    
2024-10-18T00:09:58.571087Z  INFO screenpipe_server::resource_monitor: Runtime: 131s, Total Memory: 1% (0 GB / 37 GB), Total CPU: 18%, NPU: N/A
2024-10-18T00:10:03.059536Z  INFO screenpipe_server::video: Starting FFmpeg process for file: /Users/louisbeaumont/.screenpipe/data/monitor_1_2024-10-18_00-10-03.mp4    
^C2024-10-18T00:10:03.707935Z  INFO screenpipe: received ctrl+c, initiating shutdown
2024-10-18T00:10:03.707967Z  INFO screenpipe: shutdown complete
2024-10-18T00:10:03.708017Z  INFO screenpipe: received shutdown signal for recording
thread 'tokio-runtime-worker' panicked at /Users/louisbeaumont/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/runtime/blocking/shutdown.rs:51:21:
Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-10-18T00:10:05.693295Z  INFO screenpipe_audio::whisper: 0.0s -- 30.0s    
2024-10-18T00:10:05.693315Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:10:05.693342Z  INFO screenpipe_audio::whisper:   0.0s-...:  I am a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a man who is a    
2024-10-18T00:10:06.450579Z  INFO screenpipe_audio::whisper: 30.0s -- 45.0s    
2024-10-18T00:10:06.450596Z  INFO screenpipe_audio::whisper:   0.0s-0.0s:     
2024-10-18T00:10:06.450601Z  INFO screenpipe_audio::whisper:   0.0s-30.0s:  So, let's go.    
(env) (base) louisbeaumont@louisbeaumontme-macbook:~/Documents/screen-pipe$ 

kinda work, mostly bunch of errors at start and end, and Display audio not working at all

https://www.youtube.com/watch?v=5r98tT0j1a0

EzraEllette commented 1 month ago

@louis030195 Added repeat penalty no repetition after 15 minutes with a massive dehumidifier running near my mic. Ran a final test: image

louis030195 commented 1 month ago

@EzraEllette

2024-10-18T16:58:20.095975Z [you]  Thank you.

2024-10-18T17:00:06.000161Z [you]  Thank you.

2024-10-18T17:01:19.028590Z [you]  Thank you.

2024-10-18T17:01:46.969358Z [you]  Thank you.

2024-10-18T17:02:15.067304Z [you]  Thank you.

2024-10-18T17:02:42.902966Z [you]  Thank you.

2024-10-18T17:03:10.944181Z [you]  Thank you.

2024-10-18T17:03:38.921454Z [you]  Thank you.

2024-10-18T17:04:06.962985Z [you]  Thank you.

2024-10-18T17:04:35.053024Z [you]  Thank you.

2024-10-18T17:05:02.986029Z [you]  Thank you.

2024-10-18T17:05:31.155720Z [you]  Thank you.

2024-10-18T17:05:59.975981Z [you]  Thank you.

2024-10-18T17:06:27.002774Z [you]  Thank you.

2024-10-18T17:06:54.949682Z [you]  Thank you.

2024-10-18T17:07:22.994386Z [you]  Thank you.

2024-10-18T17:07:50.977568Z [you]  Thank you.

2024-10-18T17:08:18.920124Z [you] Thank you.
2024-10-18T17:08:46.966124Z [you]  Thank you.

2024-10-18T17:09:15.016875Z [you]  Thank you.

2024-10-18T17:09:43.062080Z [you]  Thank you.

2024-10-18T17:10:11.082950Z [you]  Thank you.

2024-10-18T17:10:38.936120Z [you]  Thank you.

2024-10-18T17:11:07.082037Z [you]  Thank you.

2024-10-18T17:11:35.627123Z [you]  Thank you.

2024-10-18T17:12:03.412783Z [you]  Thank you.

2024-10-18T17:12:31.595844Z [you]  Thank you.

2024-10-18T17:12:59.901008Z [you]  Thank you.

2024-10-18T17:13:27.467794Z [you] Thank you.
2024-10-18T17:13:55.410570Z [you]  Thank you.

2024-10-18T17:14:23.506021Z [you]  Thank you.

2024-10-18T17:14:51.247378Z [you]  Thank you.

2024-10-18T17:15:19.211681Z [you]  Thank you.

2024-10-18T17:15:47.252834Z [you]  Thank you.

2024-10-18T17:16:15.088426Z [you]  Thank you.

2024-10-18T17:16:42.931627Z [you] Thank you.
2024-10-18T17:17:11.113553Z [you]  Thank you.

2024-10-18T17:17:39.095140Z [you]  Thank you.

2024-10-18T17:18:07.087969Z [you]  Thank you.

2024-10-18T17:18:35.110100Z [you]  Thank you.

2024-10-18T17:19:03.088271Z [you]  Thank you.

2024-10-18T17:19:31.114233Z [you]  Thank you.

2024-10-18T17:19:59.164065Z [you]  Thank you.

2024-10-18T17:20:27.219669Z [you]  Thank you.

2024-10-18T17:20:55.073695Z [you]  Thank you.

2024-10-18T17:21:23.125236Z [you]  Thank you.

2024-10-18T17:21:51.140878Z [you]  Thank you.

2024-10-18T17:22:18.975286Z [you]  Thank you.

2024-10-18T17:22:47.072188Z [you]  Thank you.

2024-10-18T17:23:15.042848Z [you]  Thank you.

2024-10-18T17:23:43.074946Z [you]  Thank you.

2024-10-18T17:24:11.208082Z [you]  Thank you.

2024-10-18T17:24:39.140286Z [you]  Thank you.

2024-10-18T17:25:07.157022Z [you]  Thank you.

2024-10-18T17:25:35.131140Z [you]  Thank you.

2024-10-18T17:26:03.158740Z [you]  Thank you.

2024-10-18T17:26:31.105051Z [you]  Thank you.

2024-10-18T17:26:59.152049Z [you]  Thank you.

2024-10-18T17:27:27.086300Z [you]  Thank you.

2024-10-18T17:27:55.159696Z [you]  Thank you.

2024-10-18T17:28:23.172966Z [you]  Thank you.

2024-10-18T17:28:51.128113Z [you]  Thank you.

2024-10-18T17:29:19.122335Z [you]  Thank you.

2024-10-18T17:29:47.166743Z [you]  Thank you.

2024-10-18T17:30:15.098752Z [you]  Thank you.

2024-10-18T17:30:43.130111Z [you]  Thank you.

2024-10-18T17:31:11.141765Z [you]  Thank you.

2024-10-18T17:31:39.185230Z [you]  Thank you.

2024-10-18T17:32:07.152871Z [you]  Thank you.

2024-10-18T17:32:35.208828Z [you]  Thank you.

2024-10-18T17:33:03.137629Z [you]  Thank you.

2024-10-18T17:33:31.182693Z [you]  Thank you.

2024-10-18T17:33:59.232615Z [you]  Thank you.

2024-10-18T17:34:27.149406Z [you]  Thank you.

2024-10-18T17:34:55.119002Z [you]  Thank you.

2024-10-18T17:35:23.154382Z [you]  Thank you.

2024-10-18T17:35:51.175037Z [you]  Thank you.

2024-10-18T17:36:19.156118Z [you]  Thank you.

2024-10-18T17:36:47.159276Z [you]  Thank you.

2024-10-18T17:37:15.122607Z [you]  Thank you.

2024-10-18T17:37:43.057067Z [you]  Thank you.

2024-10-18T17:38:11.225979Z [you]  Thank you.

2024-10-18T17:38:39.058831Z [you]  Thank you.

2024-10-18T17:39:07.198580Z [you]  Thank you.

2024-10-18T17:39:35.057496Z [you]  Thank you.

2024-10-18T17:40:03.063776Z [you] Thank you.
2024-10-18T17:40:31.045272Z [you] Thank you.
2024-10-18T17:40:59.104256Z [you] Thank you.
2024-10-18T17:41:27.083779Z [you] Thank you.
2024-10-18T17:41:55.054041Z [you] Thank you.
2024-10-18T17:42:23.109249Z [you] Thank you.
2024-10-18T17:42:51.067050Z [you] Thank you.
2024-10-18T17:43:19.046349Z [you] Thank you.
2024-10-18T17:43:47.104567Z [you] Thank you.
2024-10-18T17:44:15.098849Z [you] Thank you.
2024-10-18T17:44:43.067492Z [you] Thank you.
2024-10-18T17:45:11.022362Z [you] Thank you.
2024-10-18T17:45:39.081078Z [you] Thank you.
2024-10-18T17:46:07.036418Z [you] Thank you.
2024-10-18T17:46:35.119050Z [you] Thank you.
2024-10-18T17:47:03.177475Z [you] Thank you.
2024-10-18T17:47:31.134804Z [you] Thank you.
2024-10-18T17:47:59.087799Z [you] Thank you.
2024-10-18T17:48:27.031133Z [you] Thank you.
2024-10-18T17:48:55.090640Z [you] Thank you.
2024-10-18T17:49:23.147789Z [you] Thank you.
2024-10-18T17:49:51.153614Z [you] Thank you.
2024-10-18T17:50:19.123019Z [you] Thank you.
2024-10-18T17:50:47.103815Z [you] Thank you.
2024-10-18T17:51:15.109341Z [you] Thank you.
2024-10-18T17:51:43.111662Z [you] Thank you.
2024-10-18T17:52:11.148144Z [you] Thank you.
2024-10-18T17:52:39.129620Z [you] Thank you.
2024-10-18T17:53:07.111379Z [you] Thank you.
2024-10-18T17:53:35.079087Z [you] Thank you.
2024-10-18T17:54:03.137069Z [you] Thank you.
2024-10-18T17:54:31.147487Z [you] Thank you.
2024-10-18T17:54:59.110127Z [you] Thank you.
2024-10-18T17:55:27.188364Z [you] Thank you.
2024-10-18T17:55:55.122745Z [you] Thank you.
2024-10-18T17:56:23.460820Z [you] Thank you.
2024-10-18T17:56:51.302627Z [you] Thank you.
2024-10-18T17:57:19.229304Z [you] Thank you.
2024-10-18T17:57:47.193409Z [you] Thank you.
2024-10-18T17:58:15.140614Z [you] Thank you.
2024-10-18T17:58:43.214382Z [you] Thank you.
2024-10-18T17:59:11.238589Z [you] Thank you.
2024-10-18T17:59:39.224145Z [you] Thank you.
2024-10-18T18:00:07.163133Z [you] Thank you.
2024-10-18T18:00:35.198086Z [you] Thank you.
2024-10-18T18:01:03.238787Z [you] Thank you.

this is my transcription state right now, while i had conversation with someone else (IRL)

using mac mic and display output, and whisper turbo

louis030195 commented 4 weeks ago
louis030195 commented 4 weeks ago

note:

image

screenpipe with deepgram used to use only 600-800 mb now it's 4 gb, might be related to audio processing

EzraEllette commented 4 weeks ago

What command did you use to run the cli

louis030195 commented 4 weeks ago
./target/release/screenpipe --audio-transcription-engine deepgram \
--ocr-engine apple-native --monitor-id 1 --audio-device "MacBook Pro Microphone (input)" \
--audio-device "Display 1 (output)" --ignored-windows "bit" \
--ignored-windows ".env" --ignored-windows "Item-0" \
--ignored-windows "App Icon Window" --ignored-windows "Battery" \
--ignored-windows "Shortcuts" --ignored-windows "WiFi" \
--ignored-windows "BentoBox" --ignored-windows "Clock" \
--ignored-windows "Dock" --ignored-windows "DeepL" \
--deepgram-api-key "abcd" --language english
EzraEllette commented 3 weeks ago
image

Transcribe with deepgram wasn't using that much memory for me

louis030195 commented 3 weeks ago

image Transcribe with deepgram wasn't using that much memory for me

resource monitor is unreliable for memory

louis030195 commented 3 weeks ago

/tip $100 @EzraEllette

thx for the work on streaming, i'm finishing up things in #578

  • [x] websocket transcription api
  • [x] refactor to use VAD in audio device
  • [x] refactor to make audio pipeline easier to benchmark (accuracy) end-to-end
  • [ ] good benchmark
  • [ ] improve accuracy
  • [ ] test on windows and linux and other macOS
  • [ ] release new app version
algora-pbc[bot] commented 3 weeks ago

πŸŽ‰πŸŽˆ @EzraEllette has been awarded $100! 🎈🎊