Open DavidPhillipOster opened 2 years ago
Thanks for good explanation! I will check it.
Pseudocode: create a VNImageRequestHandler with an image and a VNRecognizeTextRequest with your own callback that gets called with an array of VNRecognizedTextObservation, a simple struct, which has a string of the recognized text, at most one line, and 4 x,y points that are the vertices of the bounding box of that string. The coordinate system is a ratio of the size of the image, where (0, 0) is the top left and (1, 1) is the bottom right.
Actual code: https://github.com/DavidPhillipOster/MockSimpleComic is an example of using, in its source code directory, OCRVision/OCRVision.h
(implementation is in the matching .m file) to OCR an image and put the result in an allText
property. You can see the call to it in OCRVision/OCRTracker.m
, but for Spotlight, you need only OCRVision/OCRVision.h
and OCRVision/OCRVision.m
.CBZ format documents are just a zip of numbered image files, .jpg or .png. It is often used for graphic novels. Graphic novels are also encoded as .epubs, also a zip format, where the .html files are trivial, just pointing to .jpg or .png files in the .epub's directory tree When running on macOS, you can weak-dynamic link to Apple supplied Vision framework, which when given an image returns an array of structs: each struct has the vertices of a rectangle and a string that is the OCR'ed text in that rectangle.
This proposed enhancement request would allow searching for text even if that text only occurs within images.
https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac587fc4c is an example of accessing the vision framework from Python