Open louis030195 opened 3 days ago
i also experimented with
https://github.com/huggingface/candle https://github.com/mlc-ai/mlc-llm https://huggingface.co/mychen76/mistral7b_ocr_to_json_v1 https://github.com/LlamaEdge/LlamaEdge
and other stuff
i think biggest thing is the trade-off speed/cost/local/cloud/data privacy etc
i mean multimodal is supposed to be far superior to ocr but lets see
dropping useful resources
https://kevinchen.co/blog/rewind-ai-app-teardown https://developer.apple.com/documentation/visionkit