Open fmonpelat opened 1 year ago
wouldn't be better if we expose all options, because if not every user that wants to check another option that's not added, will ask for us to add them. I saw this great blog that answers which mode you should use: https://pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/
This feature is critical for me, will this be merged? Thanks!
I'm open to offering some segmentation controls but I don't intend to offer every single option that Tesseract supports internally as the main API. One reason for that is that some of the modes simply don't work (eg. orientation and script detection is not fully available, so some of the mode flags listed in PageSegMode
in this PR have no impact). I'm thinking of something like:
OCREngine
class already has this via the setVariable API. This low-level API might just mean exposing setVariable
for use with OCRClient
as well.@robertknight thanks for the suggestion, but is there any example how can I use setVariable API in browser? It's not in type definitions and also I cannot find this function in places like (ocr as any).setVariable
or (ocr as any)._worker.setVariable
. Thanks!
The setVariable
method does not yet exist for the OCRClient
class which you are using, only the lower-level OCREngine
class which exists in the web worker. A totally-unsupported, will-break-in-future way of accessing it via the client would be ocrClient._ocrEngine.setVariable(name, value)
. If a public API was added it would be something like ocrClient.setVariable(name, value)
. What would be useful to know is which segmentation modes would be most useful to expose via a high-level API.
Thanks for the PR. The API that C++ exposes to JS is, generally speaking, a higher level API that does not expose Tesseract internals, except for the API to specifically set Tesseract internal variables, at the user's own risk. The JS API in turn is simplified compared to Tesseract's own C++ API, so easier for users to grok.
Rather than expose every possible page segmentation mode, I would suggest exposing just ones that we know of uses for and actually work. Which mode(s) do you need for your use case? I will soon have a use for one of the "treat image as a single line" modes (I'm not sure which yet, let's assume
PSM_SINGLE_LINE
for now).