How to pass --psm to detect text as a single column of text?

robertknight / tesseract-wasm

JS/WebAssembly build of the Tesseract OCR engine for use in browsers and Node

https://robertknight.github.io/tesseract-wasm/

BSD 2-Clause "Simplified" License

264 stars 27 forks source link

How to pass --psm to detect text as a single column of text? #58

Open dheimoz opened 2 years ago

dheimoz commented 2 years ago

Hey @robertknight ,

Great work you have been doing here. It is performing excellent in Vue 3 with Vite. I would like to send the parameter to tesseract engine --psm 4, in order to assume line as a single column. Sometimes, the engine assumes the text as 2 or 3 columns and the text recognized does not make sense. More info: https://stackoverflow.com/questions/44619077/pytesseract-ocr-multiple-config-options

I was looking through the source code, I could not find how to pass that option.

Thanks.

robertknight commented 2 years ago

Hello - There isn't currently an option to configure the page segmentation mode (psm). It would make sense to expose this configuration though. The API could look something like:

ocrClient.loadImage(image, {
  segmentationMode: mode,
});

Do you have an examples of images where the text columns are incorrectly recognized?

wydengyre commented 2 years ago

@dheimoz this should currently work if you are using the engine API:

engine.setVariable("tessedit_pageseg_mode", "4");

dheimoz commented 2 years ago

Thanks, I will give it y

fmonpelat commented 1 year ago

Hi, I'm not using the engine API because i want the option to use wasm and wasm-fallback from the webworker. If i make the change to send options to the engine in the ocrClient and send you the PR are you interested in this change?

robertknight commented 1 year ago

Yes, I'd be willing to accept that.

fmonpelat commented 1 year ago

@robertknight im seeing that embind doesn't support overloaded functions, i changed the lib.cpp to use this function:

  OCRResult LoadImage(const Image& image, const tesseract::PageSegMode pageSegMode);

and this one to support LoadImage with only one argument:

  OCRResult LoadImage(const Image& image);

in this page says it's doesn't support overloaded functions https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#overloaded-functions i will need to invoke LoadImage from JS with another name, I'm correct? what should be the best option?

robertknight commented 1 year ago

I can see a few options:

Make the segmentation mode (or an options struct containing the mode) a required argument, and modify the JS code to always provide it. This seems easiest.
Add a separate method that is called after LoadImage to set the segmentation mode, and call this from JS after calling loadImage

The API that lib.cpp exposes to JS does not expose Tesseract internal types/enums directly, but rather abstracts them into something that is more convenient to use from the JS side and allows Tesseract version changes to be handled entirely in lib.cpp. See for example the TextUnit enum and various small structs that are exported to JS.

fmonpelat commented 1 year ago

@robertknight i see that you are using the function iterator_level_from_unit to pass from the TextUnit to the tesseract type PageIteratorLevel. have you tried casting, because theres a lot of enum options for PSM... sorry about the delay im having progress whenever i've got time

fmonpelat commented 1 year ago

@robertknight, here's the PR: #67

fmonpelat commented 1 year ago

@robertknight i see that you are using the function iterator_level_from_unit to pass from the TextUnit to the tesseract type PageIteratorLevel. have you tried casting, because theres a lot of enum options for PSM... sorry about the delay im having progress whenever i've got time

NVM i saw that you used this function to pass between types