change the default value of param `charWhitelist` from `null` to `""` for method `OpenCvSharp.Text.OCRTesseract.Create()`

shimat / opencvsharp

OpenCV wrapper for .NET

Apache License 2.0

5.22k stars 1.13k forks source link

change the default value of param `charWhitelist` from `null` to `""` for method `OpenCvSharp.Text.OCRTesseract.Create()` #1542

Closed n0099 closed 1 year ago

n0099 commented 1 year ago

this will prevent anyone who uses tesseract to recognize non-Latin characters struggle with #873 and probably #1364 in the future fix #873

shimat commented 1 year ago

https://github.com/opencv/opencv_contrib/blob/ed1873bc2c58f1c2dc94f98c816be0d39068995f/modules/text/include/opencv2/text/ocr.hpp#L166

const char* char_whitelist=NULL

I don't think this modification is a major problem, but it creates a difference in specifications from the original OpenCV. Since OpenCvSharp has been made to have the same specifications as the original OpenCV (C++) as much as possible, this modification is unacceptable.

n0099 commented 1 year ago

I've tested another OpenCV wrapper library Emgu.CV.OCR.Tesseract, they don't have this null to [0-9a-zA-Z] fallback when not providing a whitelist.

https://github.com/shimat/opencvsharp/issues/1541#issuecomment-1460868223

since OpenCvSharp has the highest priority to make the specification the same as the original OpenCV, this is still not an issue I should face. Could you please submit an issue to opencv/opencv instead of here?

shimat commented 1 year ago

I am not willing to conform to Emgu.CV. Sorry but I will close this PR.

n0099 commented 1 year ago

the same specifications as the original OpenCV (C++) as much as possible

Now the change to upstream has been merged: https://github.com/opencv/opencv_contrib/pull/3462