Closed ronaldeddings closed 3 weeks ago
@ronaldeddings thanks for the feedback
in my experience screenpipe uses 8 gb memory (macbook pro m3 max 32 gb) but it can be possible that we have spike usages
we will look into this π
PS: atm you can reduce CPU usage (and probably memory) by using dev mode and reducing the --fps
arg (default 0.2 on mac which is 1 frame every 5 second, 0.1 would be 1 frame every 10 s, 1 fps is 1 frame per second, higher frequency = higher CPU/memory usage), or disable audio --disable-audio
.
you can also use cloud OCR using --ocr-engine unstructured
or cloud STT using --audio-transcription-engine deepgram
(they also provide higher quality)
we provide free cloud usage for a few months
I think biggest consumption is OCR atm (could be wrong)
we're going to make these settings available in non dev mode soon
thanks for the patience π
/attempt #183
with your implementation plan/claim #183
in the PR body to claim the bountyThank you for contributing to mediar-ai/screenpipe!
Memory leak is the priority in this issue [addressed to bounty contributors]
Hi @m13v @louis030195 Can you give a little brief on how to reproduce this? It is possible to do it on a windows computer?
I think it the same on all os, just keep running screenpipe, and it will start accumulating more and more operating memory. First itβs 1gb, in 10 minutes itβs 2gb. In 1 hour itβs 4gb..
On Fri, Aug 23, 2024 at 1:56β―AM Divyam Chandel @.***> wrote:
Hi @m13v https://github.com/m13v @louis030195 https://github.com/louis030195 Can you give a little brief on how to reproduce this? It is possible to do it on a windows computer?
β Reply to this email directly, view it on GitHub https://github.com/mediar-ai/screenpipe/issues/183#issuecomment-2306624922, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY62CDFV4QNOWG6SOZI6543ZS32KNAVCNFSM6AAAAABMWRKIQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBWGYZDIOJSGI . You are receiving this because you were mentioned.Message ID: @.***>
Been running for 21 minutes, when it started it was 300MB only. What am I doing wrong? Because video is not being captured, checked it in the database and the .screenpipe/data as well, no files are being created. Also, the files are not being created via the app that also I checked. I was wondering how could I also change the transcription model to maybe medium. And maybe the problem is it is using orc engine as Tesseract instead of native ocr?
(base) PS D:\projects\screen-pipe> .\target\release\screenpipe.exe
[2024-08-23T09:23:59Z WARN screenpipe] Screenpipe hasn't been extensively tested on this OS. We'd love your feedback!
Would love your feedback on the UX, let's a 15 min call soon:
https://cal.com/louis030195/screenpipe
[2024-08-23T09:23:59Z INFO screenpipe] Microphone (4- High Definition Audio Device) (input)
[2024-08-23T09:23:59Z INFO screenpipe] BenQ EL2870U (NVIDIA High Definition Audio) (output)
[2024-08-23T09:23:59Z INFO screenpipe_server::db] Migrations executed successfully.
[2024-08-23T09:23:59Z INFO screenpipe] Database initialized, will store files in C:\Users\ABC\.screenpipe
[2024-08-23T09:23:59Z INFO screenpipe] Server started on http://localhost:3030
_
__________________ ___ ____ ____ (_____ ___
/ ___/ ___/ ___/ _ \/ _ \/ __ \ / __ \/ / __ \/ _ \
(__ / /__/ / / __/ __/ / / / / /_/ / / /_/ / __/
/____/\___/_/ \___/\___/_/ /_/ / .___/_/ .___/\___/
/_/ /_/
Build AI apps that have the full context
Open source | Runs locally | Developer friendly
βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β Setting β Value β
βββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
β FPS β 1 β
β Audio Chunk Durationβ 30 seconds β
β Port β 3030 β
β Audio Disabled β false β
β Self Healing β false β
β Save Text Files β false β
β Audio Engine β WhisperTiny β
β OCR Engine β Tesseract β
β Monitor ID β 65537 β
β Data Directory β C:\Users\ABC\.screenpipe β
β Debug Mode β false β
βββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
β Audio Devices β β
[2024-08-23T09:23:59Z INFO screenpipe_server::server] Starting server on 0.0.0.0:3030
β β Microphone (4- High Definition ... β
β β BenQ EL2870U (NVIDIA High Defin... β
βββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββ
You are using local processing. All your data stays on your computer.
[2024-08-23T09:24:00Z INFO screenpipe_audio::stt] device = Cuda(CudaDevice(DeviceId(1)))
[2024-08-23T09:24:00Z INFO hf_hub] Token file not found "C:\\Users\\ABC\\.cache\\huggingface\\token"
[2024-08-23T09:24:00Z INFO screenpipe_server::video] Starting new video capture
[2024-08-23T09:24:00Z INFO screenpipe_server::video] Started capture thread
[2024-08-23T09:24:10Z INFO screenpipe_server::resource_monitor] Runtime: 10s, Total Memory: 2% (0.45 GB / 23.94 GB), Total CPU: 102%
[2024-08-23T09:24:14Z INFO screenpipe_audio::core] device: "Microphone (4- High Definition Audio Device) (input)"
[2024-08-23T09:24:14Z INFO screenpipe_audio::core] device: "BenQ EL2870U (NVIDIA High Definition Audio) (output)"
[2024-08-23T09:24:14Z INFO screenpipe_audio::core] Recording Microphone (4- High Definition Audio Device) (input) for 30 seconds
[2024-08-23T09:24:14Z INFO screenpipe_audio::core] Recording BenQ EL2870U (NVIDIA High Definition Audio) (output) for 30 seconds
[2024-08-23T09:24:20Z INFO screenpipe_server::resource_monitor] Runtime: 20s, Total Memory: 2% (0.44 GB / 23.94 GB), Total CPU: 107%
[2024-08-23T09:24:30Z INFO screenpipe_server::resource_monitor] Runtime: 30s, Total Memory: 2% (0.45 GB / 23.94 GB), Total CPU: 102%
[2024-08-23T09:24:40Z INFO screenpipe_server::resource_monitor] Runtime: 40s, Total Memory: 2% (0.45 GB / 23.94 GB), Total CPU: 102%
[2024-08-23T09:24:44Z INFO screenpipe_audio::core] Recording stopped, wrote to C:\Users\ABC\.screenpipe\data\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-24-14.mp4. Now triggering transcription
[2024-08-23T09:24:44Z INFO screenpipe_server::core] Finished record_and_transcribe for device Microphone (4- High Definition Audio Device) (input) (iteration 1)
[2024-08-23T09:24:44Z INFO screenpipe_server::core] Recording complete for device Microphone (4- High Definition Audio Device) (input) (iteration 1): "C:\\Users\\ABC\\.screenpipe\\data\\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-24-14.mp4"
[2024-08-23T09:24:44Z INFO screenpipe_server::core] Finished iteration 1 for device Microphone (4- High Definition Audio Device) (input)
[2024-08-23T09:24:44Z INFO screenpipe_audio::core] device: "Microphone (4- High Definition Audio Device) (input)"
[2024-08-23T09:24:44Z INFO screenpipe_audio::core] Recording Microphone (4- High Definition Audio Device) (input) for 30 seconds
[2024-08-23T09:24:44Z INFO screenpipe_audio::stt] Resampling from 44100 Hz to 16000 Hz
[2024-08-23T09:24:44Z INFO screenpipe_audio::stt] Total audio_frames processed: 3002, frames that include speech: 256
[2024-08-23T09:24:44Z INFO screenpipe_audio::core] Recording stopped, wrote to C:\Users\ABC\.screenpipe\data\BenQ EL2870U (NVIDIA High Definition Audio) (output)_2024-08-23_09-24-14.mp4. Now triggering transcription
[2024-08-23T09:24:44Z INFO screenpipe_server::core] Finished record_and_transcribe for device BenQ EL2870U (NVIDIA High Definition Audio) (output) (iteration 1)
[2024-08-23T09:24:44Z INFO screenpipe_server::core] Recording complete for device BenQ EL2870U (NVIDIA High Definition Audio) (output) (iteration 1): "C:\\Users\\ABC\\.screenpipe\\data\\BenQ EL2870U (NVIDIA High Definition Audio) (output)_2024-08-23_09-24-14.mp4"
[2024-08-23T09:24:44Z INFO screenpipe_server::core] Finished iteration 1 for device BenQ EL2870U (NVIDIA High Definition Audio) (output)
[2024-08-23T09:24:44Z INFO screenpipe_audio::core] device: "BenQ EL2870U (NVIDIA High Definition Audio) (output)"
[2024-08-23T09:24:45Z INFO screenpipe_audio::core] Recording BenQ EL2870U (NVIDIA High Definition Audio) (output) for 30 seconds
[2024-08-23T09:24:45Z INFO screenpipe_audio::multilingual] detected language: ("nn", "nynorsk")
[2024-08-23T09:24:45Z INFO screenpipe_audio::stt] 0.0s -- 30.0s
[2024-08-23T09:24:45Z INFO screenpipe_audio::stt] 0.0s-0.0s:
[2024-08-23T09:24:45Z INFO screenpipe_audio::stt] 0.0s-2.0s: See you next time!
[2024-08-23T09:24:45Z ERROR screenpipe_audio::stt] STT error for input C:\Users\ABC\.screenpipe\data\BenQ EL2870U (NVIDIA High Definition Audio) (output)_2024-08-23_09-24-14.mp4: no supported audio tracks
[2024-08-23T09:24:45Z INFO screenpipe_server::core] Received transcription
@m13v
@chandeldivyam
[2024-08-23T09:24:45Z ERROR screenpipe_audio::stt] STT error for input C:\Users\ABC.screenpipe\data\BenQ EL2870U (NVIDIA High Definition Audio) (output)_2024-08-23_09-24-14.mp4: no supported audio tracks
there is an error with audio
@m13v @chandeldivyam the first task for this issue is to have repeatable measurement of performance, otherwise we're just optimising blindly
example to track accuracy & speed of OCR
https://github.com/mediar-ai/screenpipe/blob/main/screenpipe-vision/benches/ocr_benchmark.rs
Thats actually very weird thing, in windows or just my computer I am not sure. cpal goes crazy.
So, if there is no audio ( i am not on a call or watching a youtube video), basically no system audio. Then there would be no callback from cpal to the stream. It is just for my computer or generally for windows I am not sure.
So in a previous project, I had artificially added some vectors without sound.
Check the write_to_file
at the end, added silence so that this issue doesn't come up.
I will start a youtube video in the background for the time being to see the memory issue, but I am very certain about why the current error is there.
So, I started a video i.e. now the output device has audio. We get rid of that error now.
(base) PS D:\projects\screen-pipe> .\target\release\screenpipe.exe
[2024-08-23T09:53:42Z WARN screenpipe] Screenpipe hasn't been extensively tested on this OS. We'd love your feedback!
Would love your feedback on the UX, let's a 15 min call soon:
https://cal.com/louis030195/screenpipe
[2024-08-23T09:53:42Z INFO screenpipe] Microphone (4- High Definition Audio Device) (input)
[2024-08-23T09:53:42Z INFO screenpipe] Headphones (4- High Definition Audio Device) (output)
[2024-08-23T09:53:42Z INFO screenpipe_server::db] Migrations executed successfully.
[2024-08-23T09:53:42Z INFO screenpipe] Database initialized, will store files in C:\Users\ABC\.screenpipe
[2024-08-23T09:53:42Z INFO screenpipe] Server started on http://localhost:3030
_
__________________ ___ ____ ____ (_____ ___
/ ___/ ___/ ___/ _ \/ _ \/ __ \ / __ \/ / __ \/ _ \
(__ / /__/ / / __/ __/ / / / / /_/ / / /_/ / __/
/____/\___/_/ \___/\___/_/ /_/ / .___/_/ .___/\___/
/_/ /_/
[2024-08-23T09:53:42Z INFO screenpipe_server::server] Starting server on 0.0.0.0:3030
Build AI apps that have the full context
Open source | Runs locally | Developer friendly
βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β Setting β Value β
βββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
β FPS β 1 β
β Audio Chunk Durationβ 30 seconds β
β Port β 3030 β
β Audio Disabled β false β
β Self Healing β false β
β Save Text Files β false β
β Audio Engine β WhisperTiny β
β OCR Engine β Tesseract β
β Monitor ID β 65537 β
β Data Directory β C:\Users\ABC\.screenpipe β
β Debug Mode β false β
βββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
β Audio Devices β β
β β Microphone (4- High Definition ... β
β β Headphones (4- High Definition ... β
βββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββ
You are using local processing. All your data stays on your computer.
[2024-08-23T09:53:42Z INFO screenpipe_audio::stt] device = Cuda(CudaDevice(DeviceId(1)))
[2024-08-23T09:53:42Z INFO hf_hub] Token file not found "C:\\Users\\ABC\\.cache\\huggingface\\token"
[2024-08-23T09:53:42Z INFO screenpipe_server::video] Starting new video capture
[2024-08-23T09:53:42Z INFO screenpipe_server::video] Started capture thread
[2024-08-23T09:53:53Z INFO screenpipe_server::resource_monitor] Runtime: 10s, Total Memory: 2% (0.41 GB / 23.94 GB), Total CPU: 108%
[2024-08-23T09:53:57Z INFO screenpipe_audio::core] device: "Microphone (4- High Definition Audio Device) (input)"
[2024-08-23T09:53:57Z INFO screenpipe_audio::core] device: "Headphones (4- High Definition Audio Device) (output)"
[2024-08-23T09:53:57Z INFO screenpipe_audio::core] Recording Microphone (4- High Definition Audio Device) (input) for 30 seconds
[2024-08-23T09:53:57Z INFO screenpipe_audio::core] Recording Headphones (4- High Definition Audio Device) (output) for 30 seconds
[2024-08-23T09:54:03Z INFO screenpipe_server::resource_monitor] Runtime: 20s, Total Memory: 2% (0.44 GB / 23.94 GB), Total CPU: 105%
[2024-08-23T09:54:13Z INFO screenpipe_server::resource_monitor] Runtime: 30s, Total Memory: 2% (0.44 GB / 23.94 GB), Total CPU: 105%
[2024-08-23T09:54:23Z INFO screenpipe_server::resource_monitor] Runtime: 40s, Total Memory: 2% (0.44 GB / 23.94 GB), Total CPU: 104%
[2024-08-23T09:54:27Z INFO screenpipe_audio::core] Recording stopped, wrote to C:\Users\ABC\.screenpipe\data\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-53-57.mp4. Now triggering transcription
[2024-08-23T09:54:27Z INFO screenpipe_server::core] Finished record_and_transcribe for device Microphone (4- High Definition Audio Device) (input) (iteration 1)
[2024-08-23T09:54:27Z INFO screenpipe_server::core] Recording complete for device Microphone (4- High Definition Audio Device) (input) (iteration 1): "C:\\Users\\ABC\\.screenpipe\\data\\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-53-57.mp4"
[2024-08-23T09:54:27Z INFO screenpipe_server::core] Finished iteration 1 for device Microphone (4- High Definition Audio Device) (input)
[2024-08-23T09:54:27Z INFO screenpipe_audio::core] device: "Microphone (4- High Definition Audio Device) (input)"
[2024-08-23T09:54:27Z INFO screenpipe_audio::core] Recording Microphone (4- High Definition Audio Device) (input) for 30 seconds
[2024-08-23T09:54:27Z INFO screenpipe_audio::stt] Resampling from 44100 Hz to 16000 Hz
[2024-08-23T09:54:27Z INFO screenpipe_audio::core] Recording stopped, wrote to C:\Users\ABC\.screenpipe\data\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-53-57.mp4. Now triggering transcription
[2024-08-23T09:54:27Z INFO screenpipe_server::core] Finished record_and_transcribe for device Headphones (4- High Definition Audio Device) (output) (iteration 1)
[2024-08-23T09:54:27Z INFO screenpipe_server::core] Recording complete for device Headphones (4- High Definition Audio Device) (output) (iteration 1): "C:\\Users\\ABC\\.screenpipe\\data\\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-53-57.mp4"
[2024-08-23T09:54:27Z INFO screenpipe_server::core] Finished iteration 1 for device Headphones (4- High Definition Audio Device) (output)
[2024-08-23T09:54:27Z INFO screenpipe_audio::core] device: "Headphones (4- High Definition Audio Device) (output)"
[2024-08-23T09:54:27Z INFO screenpipe_audio::stt] Total audio_frames processed: 3002, frames that include speech: 385
[2024-08-23T09:54:27Z INFO screenpipe_audio::core] Recording Headphones (4- High Definition Audio Device) (output) for 30 seconds
[2024-08-23T09:54:27Z INFO screenpipe_audio::multilingual] detected language: ("en", "english")
[2024-08-23T09:54:27Z INFO screenpipe_audio::stt] no speech detected, skipping 3000 DecodingResult { tokens: [50258, 50259, 50359, 50364, 307, 322, 264, 6191, 3199, 13, 50464, 50257], text: "<|0.00|> is on the technical table.<|2.00|>", avg_logprob: -1.3954070615976575, no_speech_prob: 0.7845484018325806, temperature: 0.0, compression_ratio: NaN }
[2024-08-23T09:54:27Z INFO screenpipe_audio::stt] Resampling from 48000 Hz to 16000 Hz
[2024-08-23T09:54:28Z INFO screenpipe_audio::stt] Total audio_frames processed: 3001, frames that include speech: 2204
[2024-08-23T09:54:28Z INFO screenpipe_server::core] Received transcription
[2024-08-23T09:54:28Z INFO screenpipe_server::core] Inserting audio chunk: "C:\\Users\\ABC\\.screenpipe\\data\\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-53-57.mp4"
[2024-08-23T09:54:28Z INFO screenpipe_audio::multilingual] detected language: ("en", "english")
[2024-08-23T09:54:29Z INFO screenpipe_audio::stt] 0.0s -- 30.0s
[2024-08-23T09:54:29Z INFO screenpipe_audio::stt] 0.0s-0.0s:
[2024-08-23T09:54:29Z INFO screenpipe_audio::stt] 0.0s-21.8s: is the longest podcast I've ever done. It's a fascinating, super technical and wide-ranging conversation. And I loved every minute of it. And now dear friends, here's Elon Musk. It's fifth time on this, the Lex Friedman podcast. Drink a cup of your water. Water. I'm so over caffeinated right now. Do you want some caffeine? I mean, sure. There's a, there's a nitro drink.
[2024-08-23T09:54:29Z INFO screenpipe_audio::stt] no speech detected, skipping 4500 DecodingResult { tokens: [50258, 50259, 50359, 50364, 291, 13, 50464, 50257], text: "<|0.00|> you.<|2.00|>", avg_logprob: -1.3437390944148888, no_speech_prob: 0.9409240484237671, temperature: 0.0, compression_ratio: NaN }
[2024-08-23T09:54:29Z INFO screenpipe_server::core] Received transcription
[2024-08-23T09:54:29Z INFO screenpipe_server::core] Inserting audio chunk: "C:\\Users\\ABC\\.screenpipe\\data\\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-53-57.mp4"
[2024-08-23T09:54:29Z INFO screenpipe_server::db] Successfully chunked audio transcription into 3 chunks
[2024-08-23T09:54:33Z INFO screenpipe_server::resource_monitor] Runtime: 50s, Total Memory: 2% (0.56 GB / 23.94 GB), Total CPU: 139%
[2024-08-23T09:54:53Z INFO screenpipe_server::resource_monitor] Runtime: 70s, Total Memory: 2% (0.56 GB / 23.94 GB), Total CPU: 107%
[2024-08-23T09:54:57Z INFO screenpipe_audio::core] Recording stopped, wrote to C:\Users\ABC\.screenpipe\data\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-54-27.mp4. Now triggering transcription
[2024-08-23T09:54:57Z INFO screenpipe_server::core] Finished record_and_transcribe for device Microphone (4- High Definition Audio Device) (input) (iteration 2)
[2024-08-23T09:54:57Z INFO screenpipe_server::core] Recording complete for device Microphone (4- High Definition Audio Device) (input) (iteration 2): "C:\\Users\\ABC\\.screenpipe\\data\\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-54-27.mp4"
[2024-08-23T09:54:57Z INFO screenpipe_server::core] Finished iteration 2 for device Microphone (4- High Definition Audio Device) (input)
[2024-08-23T09:54:57Z INFO screenpipe_audio::core] device: "Microphone (4- High Definition Audio Device) (input)"
[2024-08-23T09:54:57Z INFO screenpipe_audio::core] Recording Microphone (4- High Definition Audio Device) (input) for 30 seconds
[2024-08-23T09:54:57Z INFO screenpipe_audio::stt] Resampling from 44100 Hz to 16000 Hz
[2024-08-23T09:54:57Z INFO screenpipe_audio::stt] Total audio_frames processed: 3002, frames that include speech: 397
[2024-08-23T09:54:57Z INFO screenpipe_audio::core] Recording stopped, wrote to C:\Users\ABC\.screenpipe\data\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-54-27.mp4. Now triggering transcription
[2024-08-23T09:54:57Z INFO screenpipe_server::core] Finished record_and_transcribe for device Headphones (4- High Definition Audio Device) (output) (iteration 2)
[2024-08-23T09:54:57Z INFO screenpipe_server::core] Recording complete for device Headphones (4- High Definition Audio Device) (output) (iteration 2): "C:\\Users\\ABC\\.screenpipe\\data\\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-54-27.mp4"
[2024-08-23T09:54:57Z INFO screenpipe_server::core] Finished iteration 2 for device Headphones (4- High Definition Audio Device) (output)
[2024-08-23T09:54:57Z INFO screenpipe_audio::core] device: "Headphones (4- High Definition Audio Device) (output)"
[2024-08-23T09:54:57Z INFO screenpipe_audio::core] Recording Headphones (4- High Definition Audio Device) (output) for 30 seconds
[2024-08-23T09:54:57Z INFO screenpipe_audio::multilingual] detected language: ("en", "english")
[2024-08-23T09:54:58Z INFO screenpipe_audio::stt] 0.0s -- 30.0s
[2024-08-23T09:54:58Z INFO screenpipe_audio::stt] 0.0s-0.0s:
[2024-08-23T09:54:58Z INFO screenpipe_audio::stt] 0.0s-4.0s: I need no idea how to get rid of it. I need no idea how to get rid of it. I think I have to really everything.
[2024-08-23T09:54:58Z INFO screenpipe_audio::stt] Resampling from 48000 Hz to 16000 Hz
[2024-08-23T09:54:58Z INFO screenpipe_audio::stt] Total audio_frames processed: 3003, frames that include speech: 2385
[2024-08-23T09:54:58Z INFO screenpipe_server::core] Received transcription
[2024-08-23T09:54:58Z INFO screenpipe_server::core] Inserting audio chunk: "C:\\Users\\ABC\\.screenpipe\\data\\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-54-27.mp4"
[2024-08-23T09:54:58Z INFO screenpipe_server::db] Successfully chunked audio transcription into 1 chunks
[2024-08-23T09:54:58Z INFO screenpipe_audio::multilingual] detected language: ("en", "english")
[2024-08-23T09:54:59Z INFO screenpipe_audio::stt] 0.0s -- 30.0s
[2024-08-23T09:54:59Z INFO screenpipe_audio::stt] 0.0s-0.0s:
[2024-08-23T09:54:59Z INFO screenpipe_audio::stt] 0.0s-...: This was to keep you up for like, you know, tomorrow afternoon basically. Yeah, I don't know. So what is nitro? It's just got a lot of caffeine. Don't ask questions. It's called nitro. Do you need to know anything else? It's got nitrogen. That's ridiculous. I mean, what we breathe is 78% not just anyway. What do you need to add more? What's the most people think they have the really oxygen and they're actually breathing 70%?
[2024-08-23T09:54:59Z INFO screenpipe_audio::stt] no speech detected, skipping 4500 DecodingResult { tokens: [50258, 50259, 50359, 50364, 291, 13, 50464, 50257], text: "<|0.00|> you.<|2.00|>", avg_logprob: -1.3308436969938136, no_speech_prob: 0.9387331008911133, temperature: 0.0, compression_ratio: NaN }
[2024-08-23T09:54:59Z INFO screenpipe_server::core] Received transcription
[2024-08-23T09:54:59Z INFO screenpipe_server::core] Inserting audio chunk: "C:\\Users\\ABC\\.screenpipe\\data\\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-54-27.mp4"
[2024-08-23T09:54:59Z INFO screenpipe_server::db] Successfully chunked audio transcription into 3 chunks
[2024-08-23T09:55:03Z INFO screenpipe_server::resource_monitor] Runtime: 80s, Total Memory: 2% (0.56 GB / 23.94 GB), Total CPU: 139%
[2024-08-23T09:55:13Z INFO screenpipe_server::resource_monitor] Runtime: 90s, Total Memory: 2% (0.56 GB / 23.94 GB), Total CPU: 104%
[2024-08-23T09:55:23Z INFO screenpipe_server::resource_monitor] Runtime: 100s, Total Memory: 3% (0.62 GB / 23.94 GB), Total CPU: 102%
[2024-08-23T09:55:27Z INFO screenpipe_audio::core] Recording stopped, wrote to C:\Users\ABC\.screenpipe\data\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-54-57.mp4. Now triggering transcription
[2024-08-23T09:55:27Z INFO screenpipe_server::core] Finished record_and_transcribe for device Microphone (4- High Definition Audio Device) (input) (iteration 3)
[2024-08-23T09:55:27Z INFO screenpipe_server::core] Recording complete for device Microphone (4- High Definition Audio Device) (input) (iteration 3): "C:\\Users\\ABC\\.screenpipe\\data\\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-54-57.mp4"
[2024-08-23T09:55:27Z INFO screenpipe_server::core] Finished iteration 3 for device Microphone (4- High Definition Audio Device) (input)
[2024-08-23T09:55:27Z INFO screenpipe_audio::core] device: "Microphone (4- High Definition Audio Device) (input)"
[2024-08-23T09:55:27Z INFO screenpipe_audio::core] Recording Microphone (4- High Definition Audio Device) (input) for 30 seconds
[2024-08-23T09:55:27Z INFO screenpipe_audio::stt] Resampling from 44100 Hz to 16000 Hz
[2024-08-23T09:55:27Z INFO screenpipe_audio::stt] Total audio_frames processed: 3002, frames that include speech: 404
[2024-08-23T09:55:27Z INFO screenpipe_audio::core] Recording stopped, wrote to C:\Users\ABC\.screenpipe\data\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-54-57.mp4. Now triggering transcription
[2024-08-23T09:55:27Z INFO screenpipe_server::core] Finished record_and_transcribe for device Headphones (4- High Definition Audio Device) (output) (iteration 3)
[2024-08-23T09:55:27Z INFO screenpipe_server::core] Recording complete for device Headphones (4- High Definition Audio Device) (output) (iteration 3): "C:\\Users\\ABC\\.screenpipe\\data\\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-54-57.mp4"
[2024-08-23T09:55:27Z INFO screenpipe_server::core] Finished iteration 3 for device Headphones (4- High Definition Audio Device) (output)
[2024-08-23T09:55:27Z INFO screenpipe_audio::core] device: "Headphones (4- High Definition Audio Device) (output)"
[2024-08-23T09:55:27Z INFO screenpipe_audio::core] Recording Headphones (4- High Definition Audio Device) (output) for 30 seconds
[2024-08-23T09:55:27Z INFO screenpipe_audio::multilingual] detected language: ("en", "english")
[2024-08-23T09:55:27Z INFO screenpipe_audio::stt] 0.0s -- 30.0s
[2024-08-23T09:55:27Z INFO screenpipe_audio::stt] 0.0s-0.0s:
[2024-08-23T09:55:27Z INFO screenpipe_audio::stt] 0.0s-1.0s: Thank you.
[2024-08-23T09:55:27Z INFO screenpipe_audio::stt] Resampling from 48000 Hz to 16000 Hz
[2024-08-23T09:55:27Z INFO screenpipe_audio::stt] Total audio_frames processed: 3003, frames that include speech: 2266
[2024-08-23T09:55:28Z INFO screenpipe_server::core] Received transcription
[2024-08-23T09:55:28Z INFO screenpipe_server::core] Inserting audio chunk: "C:\\Users\\ABC\\.screenpipe\\data\\Microphone (4- High Definition Audio Device) (input)_2024-08-23_09-54-57.mp4"
[2024-08-23T09:55:28Z INFO screenpipe_server::db] Successfully chunked audio transcription into 1 chunks
[2024-08-23T09:55:28Z INFO screenpipe_audio::multilingual] detected language: ("en", "english")
[2024-08-23T09:55:28Z INFO screenpipe_audio::stt] 0.0s -- 30.0s
[2024-08-23T09:55:28Z INFO screenpipe_audio::stt] 0.0s-0.0s:
[2024-08-23T09:55:28Z INFO screenpipe_audio::stt] 0.0s-8.4s: you need like a mocha like from like from clockwork orange yeah is that top three
[2024-08-23T09:55:28Z INFO screenpipe_audio::stt] 8.4s-18.2s: Kubrick film for you like we're just pretty good I mean it's meant it jarring okay so first
[2024-08-23T09:55:28Z INFO screenpipe_audio::stt] 18.2s-22.5s: let's step back and big congrats on getting your link and plant it into a human and
[2024-08-23T09:55:28Z INFO screenpipe_audio::stt] no speech detected, skipping 4500 DecodingResult { tokens: [50258, 50259, 50359, 50364, 291, 13, 50464, 50257], text: "<|0.00|> you.<|2.00|>", avg_logprob: -1.3299248402253174, no_speech_prob: 0.9385878443717957, temperature: 0.0, compression_ratio: NaN }
[2024-08-23T09:55:28Z INFO screenpipe_server::core] Received transcription
[2024-08-23T09:55:28Z INFO screenpipe_server::core] Inserting audio chunk: "C:\\Users\\ABC\\.screenpipe\\data\\Headphones (4- High Definition Audio Device) (output)_2024-08-23_09-54-57.mp4"
[2024-08-23T09:55:28Z INFO screenpipe_server::db] Successfully chunked audio transcription into 2 chunks
[2024-08-23T09:55:33Z INFO screenpipe_server::resource_monitor] Runtime: 110s, Total Memory: 3% (0.62 GB / 23.94 GB), Total CPU: 129%
@louis030195
example to track accuracy & speed of OCR
Checking this, what are your thoughts currently? How could we have a repeatable behavior?
Maybe we run individual functions in a proctored env like you did for bench testing of ocr and vision? And plot a graph of memory / other vitals wtr time? That might be the first step for us to find? Also, that would be a good test case for later?
i think we can start by measuring small parts (esp. those that dont need a monitor or audio device so it can run in CI) that likely use lot of compute
then if we can find out a repeatable way to measure perf on local computer (e.g. run some command that make screenpipe run for X mins and at the end we know memory & cpu over time, average, spikes, etc.)
then if we can find way to simulate monitor and/or audio device in ci would be good
also we have this for runtime perf: https://github.com/mediar-ai/screenpipe/blob/main/screenpipe-server/src/resource_monitor.rs
but it's not super helpful atm, added feat to log to disk (then you can feed this into chatgpt to analyse perf specifically)
@m13v @chandeldivyam the first task for this issue is to have repeatable measurement of performance, otherwise we're just optimising blindly
example to track accuracy & speed of OCR
https://github.com/mediar-ai/screenpipe/blob/main/screenpipe-vision/benches/ocr_benchmark.rs
i increased apple ocr speed by 30% already and increased accuracy by 25% now
i think we can start by measuring small parts (esp. those that dont need a monitor or audio device so it can run in CI) that likely use lot of compute
Right, which parts would you suggest we start the benchmarking with first?
Also, how to run vision part as well from the terminal with windows native ocr? Currently I am not passing any arugments:
.\target\release\screenpipe.exe
The memory is constant at ~300mb, for a minute or so it went to around 600mb but came down again (whatever it was, it must have been dropped after the process). Been there for last ~30 minutes. With no error in the console.
@louis030195
After about 60 iterations, windows sent it to efficiency mode. Using around 45mb memory
Currently at 87 iteration, still same memory utilizaiton. @m13v @louis030195
Audio files are being recorded, also transcriptions are being inserted to the db
@chandeldivyam
check args
screenpipe -h
screenpipe --ocr-engine windows-native
looking at apple Instruments, it seems the "chunking" part of screenpipe uses ton of CPU
will push a benchmark for this
@chandeldivyam also if you can fix the windows version of this:
https://github.com/mediar-ai/screenpipe/blob/main/screenpipe-vision/benches/ocr_benchmark.rs
that would be great
cargo bench --bench ocr_benchmark
to run
@louis030195
check args
Have been running with -> .\target\release\screenpipe.exe --ocr-engine windows-native --audio-transcription-engine whisper-large
for 64 iterations (32 minutes)
It gradually increased from 300mb to 1500mb but then suddenly came back to 300mb again. GPU VRAM is ~3 GB from screenpipe.
@chandeldivyam also if you can fix the windows version of this:
Checking this, will raise a PR
@m13v i will push new version i disabled chunking because it:
until it can be used and is useful and solve this #110
*memory is another problem
looking at apple Instruments, it seems the "chunking" part of screenpipe uses ton of CPU
Yes, similar observation in windows. Can not pinpoint what it is but every 30 seconds when there are these logs:
[2024-08-23T11:31:41Z INFO screenpipe_server::core] Finished iteration 75 for device Headphones (4- High Definition Audio Device) (output)
[2024-08-23T11:31:41Z INFO screenpipe_audio::core] device: "Headphones (4- High Definition Audio Device) (output)"
[2024-08-23T11:31:41Z INFO screenpipe_audio::core] Recording Headphones (4- High Definition Audio Device) (output) for 30 seconds
[2024-08-23T11:31:41Z INFO screenpipe_audio::stt] Resampling from 48000 Hz to 16000 Hz
[2024-08-23T11:31:41Z INFO screenpipe_audio::stt] Total audio_frames processed: 3003, frames that include speech: 2519
[2024-08-23T11:31:41Z INFO screenpipe_server::resource_monitor] Runtime: 2270s, Total Memory: 5% (1.25 GB / 23.94 GB), Total CPU: 17%
[2024-08-23T11:31:42Z INFO screenpipe_audio::multilingual] detected language: ("en", "english")
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 0.0s -- 30.0s
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 0.0s-0.0s:
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 0.0s-6.9s: for blind people so can you speak to stimulating the visual cortex I mean the
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 6.9s-12.0s: possibilities there are just incredible to be able to give that gift back to people who
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 12.0s-17.2s: don't have sight or even any aspect of that can you just speak to the challenges of
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 17.2s-21.5s: there's several challenges here many one of which is like you said from
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 21.5s-25.2s: recording to the stimulation just any aspect of that
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 30.0s -- 45.0s
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 0.0s-0.0s:
[2024-08-23T11:31:43Z INFO screenpipe_audio::stt] 0.0s-1.0s: You know,
[2024-08-23T11:31:43Z INFO screenpipe_server::core] Received transcription
[2024-08-23T11:31:43Z INFO screenpipe_server::core] Inserting audio chunk: "C:\\Users\\ABC\\.screenpipe\\data\\Headphones (4- High Definition Audio Device) (output)_2024-08-23_11-31-11.mp4"
[2024-08-23T11:31:43Z INFO screenpipe_server::db] Successfully chunked audio transcription into 2 chunks
There is a spike in CPU utilization.
@chandeldivyam also if you can fix the windows version of this:
https://github.com/mediar-ai/screenpipe/pull/207
@louis030195
Ran the screenpipe .\target\release\screenpipe.exe --ocr-engine windows-native --audio-transcription-engine whisper-large
for over an hour. For some time it increased, then came back to normal again.
Initially 400mb -> 1500mb (for decent amount of time) -> 400mb (again for a decent amount of time)
But, there is no image being captured. Checked the db and storage location. What could the issue be? There is no error log either.
Neither is image capture running from the app.
@chandeldivyam never saw this issue before EXCEPT when windows defender decide to delete screenpipe or similar
so there is no logs error or something related to vision? you can try adding --debug
also to have more info
so there is no logs error or something related to vision? you can try adding
--debug
also to have more info
will check this
[2024-08-23T13:41:09Z DEBUG screenpipe_server::db] OCR text inserted into db successfully
[2024-08-23T13:41:09Z ERROR xcap::platform::impl_window] Access is denied. (0x80070005)
[2024-08-23T13:41:09Z ERROR xcap::platform::impl_window] Access is denied. (0x80070005)
Maybe this could be the reason? I can see entries in ocr_text
table
In video_chunks
I can seeC:\Users\ABC\.screenpipe\data\2024-08-23_13-40-52.mp4
but this file is 0KB
Maybe some permission issue?
@louis030195
btw for mac one of the memory issue i suspect is:
to confirm
[2024-08-23T13:41:09Z DEBUG screenpipe_server::db] OCR text inserted into db successfully [2024-08-23T13:41:09Z ERROR xcap::platform::impl_window] Access is denied. (0x80070005) [2024-08-23T13:41:09Z ERROR xcap::platform::impl_window] Access is denied. (0x80070005)
Maybe this could be the reason? I can see entries in
ocr_text
tableIn
video_chunks
I can seeC:\Users\ABC\.screenpipe\data\2024-08-23_13-40-52.mp4
but this file is 0KBMaybe some permission issue?
@louis030195
can you try to run this maybe
https://github.com/nashaofu/xcap/blob/master/examples/window.rs
Ran the terminal as administrator, didn't get the error.
can you try to run this maybe
https://github.com/nashaofu/xcap/blob/master/examples/window.rs
I think this should work because we are getting screen captures. As we can are getting the ocr_text table filled. But let me check.
so first source of leaks: https://github.com/RustAudio/cpal/pull/894/files
second is xcap:
i suggest we suggest we switch to scap
even if we lose the app name and window name feature for now (which we can implement within 1-2 d i bet) so fix memory issues first
trying a hack now to fix with xcap
update: any help to switch to this: https://github.com/mediar-ai/scap would be good
we need to impl: app_name
, window_name
and these
https://github.com/CapSoftware/scap/issues/114
Got an issue, which was making my RAM go crazy. After changes, recording the screen for the last 40 minutes and ram didn't move an inch.
We are never popping `frame_queue`
So it grows till the max size and takes up all the memory.
screenpipe-server\src\video.rs
let frame_queue = Arc::new(ArrayQueue::new(MAX_QUEUE_SIZE));
We initialize it, but this never gets popped in our codebase. So, keeps increasing the memory till we reach MAX_QUEUE_SIZE
, but its too late as MAX_QUEUE_SIZE = 100
Till the time it goes there, my system basically crashed. For around len of 10, it was 5GB, so it needed 50GB to reach 100.
Changed it to max = 10 and have been running the recording at --fps = 1, its been going on for 50 minutes and no memory increase.
Checked everywhere from my understanding, ocr_frame_queue
and video_frame_queue
are being consumed, but frame_queue
is dead code. [I could be wrong here, why have we used it I am not sure about, I would love to know]
We should do two things now ->
@louis030195 @m13v
[2024-08-23T19:50:31Z INFO screenpipe_server::video] Starting FFmpeg process for file: C:\Users\ABC\.screenpipe\data\2024-08-23_19-50-31.mp4
[2024-08-23T19:50:31Z INFO screenpipe_server::resource_monitor] Runtime: 3050s, Total Memory: 1% (0.30 GB / 23.94 GB), Total CPU: 55%
[2024-08-23T19:50:51Z INFO screenpipe_server::resource_monitor] Runtime: 3070s, Total Memory: 1% (0.19 GB / 23.94 GB), Total CPU: 50%
[2024-08-23T19:51:01Z INFO screenpipe_server::resource_monitor] Runtime: 3080s, Total Memory: 1% (0.30 GB / 23.94 GB), Total CPU: 46%
[2024-08-23T19:51:11Z INFO screenpipe_server::resource_monitor] Runtime: 3090s, Total Memory: 2% (0.50 GB / 23.94 GB), Total CPU: 49%
[2024-08-23T19:51:21Z INFO screenpipe_server::resource_monitor] Runtime: 3100s, Total Memory: 1% (0.20 GB / 23.94 GB), Total CPU: 55%
Perfect, create a pull request, seems like you've solved it! Congrats!
On Fri, Aug 23, 2024 at 12:52β―PM Divyam Chandel @.***> wrote:
[2024-08-23T19:50:31Z INFO screenpipe_server::video] Starting FFmpeg process for file: C:\Users\ABC.screenpipe\data\2024-08-23_19-50-31.mp4 [2024-08-23T19:50:31Z INFO screenpipe_server::resource_monitor] Runtime: 3050s, Total Memory: 1% (0.30 GB / 23.94 GB), Total CPU: 55% [2024-08-23T19:50:51Z INFO screenpipe_server::resource_monitor] Runtime: 3070s, Total Memory: 1% (0.19 GB / 23.94 GB), Total CPU: 50% [2024-08-23T19:51:01Z INFO screenpipe_server::resource_monitor] Runtime: 3080s, Total Memory: 1% (0.30 GB / 23.94 GB), Total CPU: 46% [2024-08-23T19:51:11Z INFO screenpipe_server::resource_monitor] Runtime: 3090s, Total Memory: 2% (0.50 GB / 23.94 GB), Total CPU: 49% [2024-08-23T19:51:21Z INFO screenpipe_server::resource_monitor] Runtime: 3100s, Total Memory: 1% (0.20 GB / 23.94 GB), Total CPU: 55%
image.png (view on web) https://github.com/user-attachments/assets/8554b997-c834-45c9-94bb-f20898fabe06
β Reply to this email directly, view it on GitHub https://github.com/mediar-ai/screenpipe/issues/183#issuecomment-2307721226, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY62CDAPCAJIAS6FGZ2LBDDZS6HG5AVCNFSM6AAAAABMWRKIQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXG4ZDCMRSGY . You are receiving this because you were mentioned.Message ID: @.***>
Perfect, create a pull request, seems like you've solved it! Congrats!
@m13v
@chandeldivyam good job, actually this was my bad when trying to solve this issue by using data structures that don't grow infinitely (e.g. VecDeque
-> ArrayQueue
and avoid anti patterns like Mutex) but seems like the if was incorrect
i released new version with this fix in app & brew now
this does not solve the original issue (my memory still growing forever) though which is essentially xcap and screencapture leaking memory due to objects not being released in unsafe
code blocks
i suggest we focus on switching to scap
now which consists in:
we need to impl:
app_name
,window_name
and these
regarding macos audio output the leak seems acceptable for now because we don't call so frequently the function (once every 30s, while the Windows:all() is probably called >5 times every frame per monitor)
@louis030195 Yes, we should also first create a mechanism to benchmark. I feel this because xcap issue must be with screencapturekit.
Because I ran screenpipe (on windows) without audio for hours, the memory didn't move. It was same / lower than where it started in first 30 seconds.
Maybe just something like a plot against time while video capture? Current reporting benchmark itself (which we see in the terminal) plotted over time? This could potentially help us understand if something is changing.
@louis030195 Yes, we should also first create a mechanism to benchmark. I feel this because xcap issue must be with screencapturekit.
Because I ran screenpipe (on windows) without audio for hours, the memory didn't move. It was same / lower than where it started in first 30 seconds.
xcap does not use screencaptuekit (mac) they use old apple api
screencapturekit is the new api for mac used in scap
using scap would also solve #63 (about 3-4 linux users cannot use screenpipe because of this) scap is also 21x time faster than xcap on mac capture (tested)
we log to files resource usage in here: https://github.com/mediar-ai/screenpipe/blob/main/screenpipe-server/src/resource_monitor.rs
you need to add SAVE_RESOURCE_USAGE=true
in env var before running cli
and i did a google colab to create charts out of this data: https://colab.research.google.com/drive/1zELlGdzGdjChWKikSqZTHekm5XRxY-1r?usp=sharing
in the past i was working in observability team to track billion of devices performance with promotheus + grafana but i dont think this is good for consumer things, we just have to write metrics ourselves
ideally we should use this well (although this is more for logging): https://github.com/tokio-rs/tracing
i think we should take inspiration on how they use it:
https://github.com/search?q=repo:huggingface/candle%20span&type=code
Great, I think we are well researched then, that there is tangible benefit of moving from xcap -> scap
Let me look into both xcap and scap and try to migrate us to scap.
Things I am wondering right now:
Let me look into this section and will update.
@chandeldivyam good!
i will focus on "visible/focus window and app name" for macos now
and push here https://github.com/mediar-ai/scap
once this is usable on macos/linux/windows lets replace xcap in screenpipe
Perfect
I went through the screenpipe code as well.
We should ideally get the response from scap and transform it, in a back compatible format.
screenpipe-vision\src\monitor.rs
screenpipe-vision\src\capture_screenshot_by_window.rs
The functions capturing and validating the images would change and we can transform the new output by scap into our impl, that is something which I was thinking.
this was not the issue
the main issue is solved by upgrading to xcap
to latest version for macos which includes a fix of the memory leak
the bounty is still live to solve the 2nd memory leak here:
https://github.com/louis030195/cpal-d
this is less a problem because we call infrequently so would cause memory issue only after having ran screenpipe for days
I'm experiencing high CPU and memory utilization from the screenpipe process
Under settings, application is showing status "healthy"
Any updates would be greatly appreciated so that I can run screenpipe with other memory intensive applications at the same time