mapozyan / caps

Power Search: A full-text search plugin for Calibre
GNU General Public License v3.0
35 stars 2 forks source link

on MacOS, Calibre Power Search Plugin should use the Vision Framework to index text in image formats like CBZ #7

Open DavidPhillipOster opened 2 years ago

DavidPhillipOster commented 2 years ago

.CBZ format documents are just a zip of numbered image files, .jpg or .png. It is often used for graphic novels. Graphic novels are also encoded as .epubs, also a zip format, where the .html files are trivial, just pointing to .jpg or .png files in the .epub's directory tree When running on macOS, you can weak-dynamic link to Apple supplied Vision framework, which when given an image returns an array of structs: each struct has the vertices of a rectangle and a string that is the OCR'ed text in that rectangle.

This proposed enhancement request would allow searching for text even if that text only occurs within images.

https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac587fc4c is an example of accessing the vision framework from Python

mapozyan commented 2 years ago

Thanks for good explanation! I will check it.

DavidPhillipOster commented 2 years ago

Pseudocode: create a VNImageRequestHandler with an image and a VNRecognizeTextRequest with your own callback that gets called with an array of VNRecognizedTextObservation, a simple struct, which has a string of the recognized text, at most one line, and 4 x,y points that are the vertices of the bounding box of that string. The coordinate system is a ratio of the size of the image, where (0, 0) is the top left and (1, 1) is the bottom right.

DavidPhillipOster commented 2 years ago

Actual code: https://github.com/DavidPhillipOster/MockSimpleComic is an example of using, in its source code directory, OCRVision/OCRVision.h (implementation is in the matching .m file) to OCR an image and put the result in an allText property. You can see the call to it in OCRVision/OCRTracker.m, but for Spotlight, you need only OCRVision/OCRVision.h and OCRVision/OCRVision.m