Recognition of texts in whole documents.

fernando-jose-silva commented 7 years ago

First of all I would like to thank you for the beautiful work on nvda as a whole, more precisely today with the functionality of character recognition with the ocr.As a suggestion for this great functionality, I would like to propose:It would be possible for the nvda to recognize an entire document at one time. Today we have to recognize the document by visible pages or parts on the screen.My use case is:I was able to recognize and read a page from a pdf file, however to read the next page in the document I have to put it on the next page and then run the recognition again.I find this inaccurate, not always the new page is all visible on the screen depending on the adobe reader settings.And it's also annoying to have to do reconnaissance with every new page.Another issue is that to recognize the next page it was necessary to press ESC, and then perform the reconnaissance a couple of times so that it was possible to read the next page, resulting in slowness in being able to read the entire document.Perhaps for a future functionality, nvda could recognize some structures as tables, even though due to the recognitions they may not be as accurate.

jcsteh commented 7 years ago

Recognising tables is something the OCR engine needs to support. Win 10 OCR certainly doesn't support this. I don't know of any free OCR engine that does.

Brian1Gaff commented 7 years ago

The issue I think with picture pdfs are really a whole other subject, in my view. Most ocr done in screenreaders tends to be to actually read a screen of controlls etc. What is needed in my view is for Adobe to include support for what you ask for in their reader in the first place. Brian

bglists@blueyonder.co.uk Sent via blueyonder. Please address personal email to:- briang1@blueyonder.co.uk, putting 'Brian Gaff' in the display name field. ----- O

fernando-jose-silva commented 7 years ago

Thanks for the comments. Reading documents using text recognition ocr not only in pdf but in outlook messages and images as jpg for me are very important, and are part of my daily corporate work. And I doubt that any company that maintains the support of the files will adsize a character recognition to its tools. My proposal is that nvda can make this activity more good. I usually use character recognition in screen readers for a quick read, when I agree that I will need to read the document in more detail I refer to a professional software like abby fine reader. Other screen readers already have the ability to reconfess entire documents using the ocr. But I'm trying to make nvda my main work tool.

nvaccess / nvda

Recognition of texts in whole documents. #7380