Feature page segmentation

robertknight / tesseract-wasm

JS/WebAssembly build of the Tesseract OCR engine for use in browsers and Node

https://robertknight.github.io/tesseract-wasm/

BSD 2-Clause "Simplified" License

264 stars 27 forks source link

Feature page segmentation #67

Open fmonpelat opened 1 year ago

robertknight commented 1 year ago

Thanks for the PR. The API that C++ exposes to JS is, generally speaking, a higher level API that does not expose Tesseract internals, except for the API to specifically set Tesseract internal variables, at the user's own risk. The JS API in turn is simplified compared to Tesseract's own C++ API, so easier for users to grok.

Rather than expose every possible page segmentation mode, I would suggest exposing just ones that we know of uses for and actually work. Which mode(s) do you need for your use case? I will soon have a use for one of the "treat image as a single line" modes (I'm not sure which yet, let's assume PSM_SINGLE_LINE for now).

fmonpelat commented 1 year ago

wouldn't be better if we expose all options, because if not every user that wants to check another option that's not added, will ask for us to add them. I saw this great blog that answers which mode you should use: https://pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/

AndyGura commented 1 year ago

This feature is critical for me, will this be merged? Thanks!

robertknight commented 1 year ago

I'm open to offering some segmentation controls but I don't intend to offer every single option that Tesseract supports internally as the main API. One reason for that is that some of the modes simply don't work (eg. orientation and script detection is not fully available, so some of the mode flags listed in PageSegMode in this PR have no impact). I'm thinking of something like:

A high level API which provides options like: Normal segmentation, single column, single line
A low level API with no stability guarantees. The OCREngine class already has this via the setVariable API. This low-level API might just mean exposing setVariable for use with OCRClient as well.

AndyGura commented 1 year ago

@robertknight thanks for the suggestion, but is there any example how can I use setVariable API in browser? It's not in type definitions and also I cannot find this function in places like (ocr as any).setVariable or (ocr as any)._worker.setVariable. Thanks!

robertknight commented 1 year ago

The setVariable method does not yet exist for the OCRClient class which you are using, only the lower-level OCREngine class which exists in the web worker. A totally-unsupported, will-break-in-future way of accessing it via the client would be ocrClient._ocrEngine.setVariable(name, value). If a public API was added it would be something like ocrClient.setVariable(name, value). What would be useful to know is which segmentation modes would be most useful to expose via a high-level API.