Closed benjaminshafii closed 3 months ago
according to claude:
For continuous screen recording, MP4 is the most efficient in terms of storage, balancing quality and file size.
rewind:
my idea is to make screenpi.pe usable either:
in terms of storage
but also in terms of compute
say post processing or compression could be done in the cloud too to save compute locally but adding network load
@ashgansh
I started experimenting with downscaling images. If the purpose is to pipe content to LLM i think it would be beneficial to reduce the amount of input tokens we would send. (the llm doesn't care if an image looks good so we should be able to make some interesting tradeoffs). At the moment I feel that this implementation is not the way to go, but thought I might share it here to facilitate future work on this.
yeah i think now running 100% llama3 24/7 my mac go fire
probably again an hybrid approach smaller models with larger models for different use cases
i think .mp4 is good for the rewind use case. so it depends in which direction screenpipe wants to go.
either:
a) pure piping: data goes to stdout to and another unix-like tool offload takes next task egg screenpipe | llm "prompt"
b) storage screenpipe --location=/some/path
i don't think you'll be able to do a with .mp4 (or maybe i just don't see it?). so it would force screenpipe into a b type solution.
@ashgansh
i think .mp4 is good for the rewind use case. so it depends in which direction screenpipe wants to go.
either: a) pure piping: data goes to stdout to and another unix-like tool offload takes next task egg
screenpipe | llm "prompt"
b) storagescreenpipe --location=/some/path
i don't think you'll be able to do a with .mp4 (or maybe i just don't see it?). so it would force screenpipe into a b type solution.
curious to know why do you think the unix-like pipe approach is interesting? still considering to separate responsibilities
so it would be like:
screenpipe
# here it would stream json objects containing screenshots, text, audio, metadata, etc.
# could be used like
screenpipe | jq '.[.audio]' | whisper | chatgpt "how many time did i use hedge words"
screenpipe | jq '.[.text]' | chatgpt "keep log of my day"
screenpipe | jq '.[.metadata.app]' | chatgpt "maintain a markdown table of how much time i spend on apps"
or through SDK:
const screenPipe = new ScreenPipe();
for (const tick of await screenPipe.stream()) {
db.from("memories").add(tick)
}
const screenPipe = new ScreenPipe();
for (const tick of await screenPipe.stream()) {
s3.store("/memories/"+new Date()).add(tick)
}
or similar
so the screenpipe package/lib/cli/sdk would only contains the code gathering consumer hardware info (computer' inputs & outputs) and stream to stdout or sdk
what are pros & cons?
@ashgansh fyi i've been reflecting on this and ended up trying to split properly responsibilities in the branch "audio" (after seeing the code was getting too messy)
idea is to have:
screenpipe-vision
: capture vision + processing like OCR (local or remote) - stream data to stdout / sdk (no db, api code)screenpipe-audio
: capture audio + processing like transcription (local or remote) - stream data to stdout / sdk (no db, api code)screenpipe-server
: like on main, mostly glue of vision into mp4 videos, audio, api & dbi'm trying to design this lib so that it's easy to extend it with typescript instead of rust because i've noticed 99.9% of programmer seems afraid of rust
ideally it would be easy to go from screenpipe to building nextjs apps
e.g. 30s to get started w screenpipe and max 5 min for prod config plumbing between screenpipe and your preferred compute/storage that runs 24/7 and connected to a nextjs UI
@louis030195 you might have some luck compressing jpeg's into mp4s. I'm using another open source tool and I've been able to compress over 100gb of 4k jpegs down into about 1GB per month. 1 screenshot every 10 seconds. I'm not sure how you could index that one mp4 file though.. that might be the tricky part.
@DmacMcgreg we already indeed encode all frames and audio in mp4 thats how we only use <30gb/m even though recording multi audio + screen 24/7 :)
tl;dr
More Info
Screen pipe generates screenshots around ~10MB.
I tried to modify the code to add compression (see full source below ).
problem is that it puts a lot of strain on cpu + it's really slow around 2-3s per screenshot. probably some ways to optimize though.
ran some prototypes with downscaling 2x + jpeg conversions with screenpipe
Naive implementation: