rmtheis / tess-two

Fork of Tesseract Tools for Android
Apache License 2.0
3.76k stars 1.38k forks source link

OCR number #245

Closed polaris0227 closed 6 years ago

polaris0227 commented 6 years ago

Summary: I need to get numbers from image which contains only numbers. Problem is that sometimes I got letters instead of numbers. Please advice if anyone has solution. Thanks.

Steps to reproduce the issue:

  1. For instance, "51" on image, expected result is "51", but sometimes result is "SI". I know image only contains numbers only, but no way to force to get numbers.

Expected result: "51"

Actual result: "Sl"

Tess-two version: 8.0.0

Android version: 6.0

Phone/device model: Hwawei

Phone/device architecture (armeabi, armeabi-v7a, x86, mips, arm64-v8a, x86_64, mips64):

Link to training data used: eng.traineddata

Link to image used as input: https://drive.google.com/open?id=17GNvsR2DlGf2yl9zZuppmfM8dxt81ymE

rmtheis commented 6 years ago

You can restrict the output to show only numbers by using a whitelist. See https://stackoverflow.com/q/38650899/667810

polaris0227 commented 6 years ago

Thanks for your reply. I have a question about whitelist. Isn't it just for filtering? I mean that if it detects "51" as "Sl", then it doesn't show any result. if it detects "51" as "5l", then show "5" only... I'm not sure if I say correctly.

On Thu, Jun 21, 2018 at 5:00 PM, Robert Theis notifications@github.com wrote:

Closed #245 https://github.com/rmtheis/tess-two/issues/245.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rmtheis/tess-two/issues/245#event-1693738380, or mute the thread https://github.com/notifications/unsubscribe-auth/ANbfpOLgGHbopMEuSvGO07eVU9nsAyV8ks5t-6bkgaJpZM4UxmwM .

rmtheis commented 6 years ago

No, the whitelist isn't just a post-OCR filter. With the whitelist, you'd see "51" where you would otherwise have seen "Sl".

polaris0227 commented 6 years ago

Thank you very much for your help. 😊

On Fri, Jun 22, 2018 at 4:01 AM, Robert Theis notifications@github.com wrote:

No, the whitelist isn't just a post-OCR filter. With the whitelist, you'd see "51" where you would otherwise have seen "Sl".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rmtheis/tess-two/issues/245#issuecomment-399288502, or mute the thread https://github.com/notifications/unsubscribe-auth/ANbfpJKZjWpeiVNtOKvcFZhTS7jVmp-Rks5t_EHpgaJpZM4UxmwM .