Open ashgansh opened 4 days ago
according to claude:
For continuous screen recording, MP4 is the most efficient in terms of storage, balancing quality and file size.
rewind:
my idea is to make screenpi.pe usable either:
in terms of storage
but also in terms of compute
say post processing or compression could be done in the cloud too to save compute locally but adding network load
@ashgansh
I started experimenting with downscaling images. If the purpose is to pipe content to LLM i think it would be beneficial to reduce the amount of input tokens we would send. (the llm doesn't care if an image looks good so we should be able to make some interesting tradeoffs). At the moment I feel that this implementation is not the way to go, but thought I might share it here to facilitate future work on this.
yeah i think now running 100% llama3 24/7 my mac go fire
probably again an hybrid approach smaller models with larger models for different use cases
i think .mp4 is good for the rewind use case. so it depends in which direction screenpipe wants to go.
either:
a) pure piping: data goes to stdout to and another unix-like tool offload takes next task egg screenpipe | llm "prompt"
b) storage screenpipe --location=/some/path
i don't think you'll be able to do a with .mp4 (or maybe i just don't see it?). so it would force screenpipe into a b type solution.
@ashgansh
i think .mp4 is good for the rewind use case. so it depends in which direction screenpipe wants to go.
either: a) pure piping: data goes to stdout to and another unix-like tool offload takes next task egg
screenpipe | llm "prompt"
b) storagescreenpipe --location=/some/path
i don't think you'll be able to do a with .mp4 (or maybe i just don't see it?). so it would force screenpipe into a b type solution.
curious to know why do you think the unix-like pipe approach is interesting? still considering to separate responsibilities
so it would be like:
screenpipe
# here it would stream json objects containing screenshots, text, audio, metadata, etc.
# could be used like
screenpipe | jq '.[.audio]' | whisper | chatgpt "how many time did i use hedge words"
screenpipe | jq '.[.text]' | chatgpt "keep log of my day"
screenpipe | jq '.[.metadata.app]' | chatgpt "maintain a markdown table of how much time i spend on apps"
or through SDK:
const screenPipe = new ScreenPipe();
for (const tick of await screenPipe.stream()) {
db.from("memories").add(tick)
}
const screenPipe = new ScreenPipe();
for (const tick of await screenPipe.stream()) {
s3.store("/memories/"+new Date()).add(tick)
}
or similar
so the screenpipe package/lib/cli/sdk would only contains the code gathering consumer hardware info (computer' inputs & outputs) and stream to stdout or sdk
what are pros & cons?
tl;dr
More Info
Screen pipe generates screenshots around ~10MB.
I tried to modify the code to add compression (see full source below ).
problem is that it puts a lot of strain on cpu + it's really slow around 2-3s per screenshot. probably some ways to optimize though.
ran some prototypes with downscaling 2x + jpeg conversions with screenpipe
Naive implementation: