nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
https://www.nvaccess.org/
Other
2.1k stars 634 forks source link

NVDA OCR recognizes tables and text with long spaces in between in a wrong order #13478

Open Hadi-1624 opened 2 years ago

Hadi-1624 commented 2 years ago

Hi there This bug has been there since I believe NVDA had the OCR capability by using the Microsoft OCR engine. When the OCR is performed on an image, application or a game screen that has a table inside, or a row of text but with space between them, the OCR result places and reads them in a very wrong order, as if it is trying to help by forcing us to read the text in columns, which makes things very confusing I do use NVDA OCR function to work with networking software and playing games, it is extremely confusing to work with the results as as they are presented in a wrong order and not in the way that's shown on the screen. Examples will be provided below within the correct format of submitting bugs to this repository A small note that jaws screen reader by freedom scientific has the function to use microsoft OCR engine and it does not have the issue the NVDA has.

Steps to reproduce:

Use NVDA to perform OCR on an image or a game screen with tables, or rows of text. I will provide an image for testing purposes

  1. download the image from the attached image below, right click and save as, . Screenshot 2022-03-15 05 09 19

  2. right click on the image and open it with microsoft paint. alternatively, launch the microsoft paint app and press control+o and select the image to be openned. This will let you OCR the image with clean precision.

  3. Make sure your NVDA is set to object view, and press NVDA+R to perform OCR, and read the text.

  4. This is a character sheet. It has stats and an about section at the bottom, you can see how the information is confusingly provided by the OCR

Actual behavior:

example 1, This is an NVDA OCR result, name: hit point: stamina: move: fitness: Terran scout 15 9 5 10

example 2, this is an NVDA OCR result name family name phone number John Miller 001xxxxx department status Tech support level 2 online

example 3, this is an NVDA OCR result military pulse engine MK I early warp drive I thermal reactor mk I commercial pulse engine MK I stable field warp drive I thermal reactor mk II Reactive engines mk I Hyper deep drive I fusion reactor MK I

Expected behavior:

example1, this is how the image looks like when you read the text naturally name: Terran scout hit point: 15 stamina: 9 move: 5 fitness: 10

example 2, this is how the image looks like when you read the text naturally name John family name miller phone number 001xxxxx department Tech support level 2 status online

example 3, this is how the image looks like when you read the text naturally military pulse engine MK I commercial pulse engine MK I reactive engines MK I early warp drive I stable field warp drive I hyper deep drive I thermal reactor mk I thermal reactor mk II fusion reactor MK I

There is no need for more examples, however note that when the tables or rows of texts are more than three words, the result becomes even more confusing. for example a row of 8 items.

System configuration

NVDA installed/portable/running from source:

Installed on C drive

NVDA version:

2021.3.3

Windows version:

windows 10, version 21H1, OS build 19043

Name and version of other software in use when reproducing the issue:

any image, game or application that does not have accessible text, therefore the need to use OCR to read

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

Any version since the OCR capability was introduced on windows 10

If NVDA add-ons are disabled, is your problem still occurring?

yes

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

yes

BlindGuyNW commented 2 years ago

I have to second @Hadi-1624 on this issue. NVDA OCR is very hard to use for any layout which contains data in columns. Jaws is able to produce far more reasonable OCR output from even fairly complex input. The underlying OCR engine is the same, at least on Win 10 with the proper Jaws setting enabled.

CyrilleB79 commented 2 years ago

Hello

You both explain that Jaws' OCR is the same as NVDA's one, i.e. Microsoft's one. Would you have a link to a documentation to confirm this? At the time I was using Jaws, the OCR was provided by Nuance, not by Microsoft. If this has changed, just let me know. Thanks.

BlindGuyNW commented 2 years ago

There is an entry in the what’s new section for Jaws 2021, as follows.

When using the Convenient OCR feature to recognize the current control, screen, or window, you now have the option to use the Microsoft OCR engine as this may provide better OCR results for onscreen images than OmniPage, which is the default. For example, if you press INSERT+SPACEBAR, followed by O, and then W to recognize the graphical window in an application and you find the results less than satisfactory, open Settings Center, select the Use Microsoft OCR For Screen Recognitions check box, and then try the OCR again.

SOURCE: https://support.freedomscientific.com/Downloads/JAWS/JAWSWhatsNew?version=2021

Hadi-1624 commented 2 years ago

@CyrilleB79 Jaws by default uses their own OCR engine. however in the settings center you can check a checkbox that says, use Microsoft OCR engine instead. once this is done, performing OCR will uses Microsoft's engine. IT is interesting that when jaws performs OCR with microsoft's OCR engine, the results are shown as fast as NVDA. whereas freedom scientific's engine has a very noticeable lag for scanning. I have compared some OCR results as well, and the read order issue aside, they both detect the same results and if NVDA OCR misses a block of text, JAWS OCR with Microsoft engine will also miss it as well. I hope this helps