louis030195 / screen-pipe

Turn your screen into actions (using LLMs). Inspired by adept.ai, rewind.ai, Apple Shortcut. Rust + WASM.
https://screenpi.pe
MIT License
66 stars 1 forks source link

ocr #7

Open louis030195 opened 3 days ago

louis030195 commented 3 days ago

dropping useful resources

https://kevinchen.co/blog/rewind-ai-app-teardown https://developer.apple.com/documentation/visionkit

louis030195 commented 3 days ago

i also experimented with

https://github.com/huggingface/candle https://github.com/mlc-ai/mlc-llm https://huggingface.co/mychen76/mistral7b_ocr_to_json_v1 https://github.com/LlamaEdge/LlamaEdge

and other stuff

i think biggest thing is the trade-off speed/cost/local/cloud/data privacy etc

louis030195 commented 3 days ago

i mean multimodal is supposed to be far superior to ocr but lets see