sumatrapdfreader / sumatrapdf

SumatraPDF reader
http://www.sumatrapdfreader.org
GNU General Public License v3.0
13.76k stars 1.74k forks source link

Not accessible to screen readers #321

Open rkingett opened 9 years ago

rkingett commented 9 years ago

I am a blind user who is using the free screen reader NVDA. I would really love to use this program but the documents, even though they are text documents, will not read at all with NVDA.

Consider making this accessible in the future?

mohammad-suliman commented 9 years ago

yes, I agree, I believe that there are a blind people who want to use this program other than me and the person who reported this issue. this program focuses on simplicity, therefor, it is attractive for screen reader users. I realized that you intend to support narrator, which is a magnificent step forward. it will be also better if you consider supporting nvda - a free and open source screen reader, which is more advanced than narrator, and it is widely used among the blind community.

rkingett commented 9 years ago

I hope they do!

SumatraPeter commented 9 years ago

I don't know whether adding support for each screen reader requires dedicated code or not, but if we're talking of popularity and wide usage amongst the blind community then JAWS has to be supported IMO.

rkingett commented 9 years ago

One thing developers have noted in the past is that if it is fully accessible to NVDA it will work with all others

SumatraPeter commented 9 years ago

One thing developers have noted in the past is that if it is fully accessible to NVDA it will work with all others

If that's true then it'll certain make things easier for the SumatraPDF developers, because writing (and maintaining) custom code for each screen reader would be a nightmare.

cary-rowen commented 4 years ago

I wish it would work with screen reader, just like FoxitReader.

GitHubRulesOK commented 4 years ago

Just to be clear the current SumatraPDF build can copy screen contents to clipboard and many speech apps will READ out loud the clipboard content

There are other apps that scrape the image pixels from the zoomed screen image and convert them to speech SumatraPDF is no different to any other input. It is the pixel quality that determines the value of the garbled translation result

The results of linking with NVDA & MSNarrator have varied over the years as either side of the links has changed it should be noted mainly the heading CONTROL FRAMEWORK or bookmarks to left are visible to screen narrators and DO work so narrator will confirm the active selection. It is the rendered image area that visual users see which is not accessible to screen readers

SumatraPDF is not customized to generally work with screen reader JAWS (Job Access With Speech) " just like FoxitReader "

rkingett commented 4 years ago

Unfortunately, the developer does not want this to become accessible to NVDA, and similar. Unless there's a law we can use to force this reader to be accessible, it's gonna be a very tough fight. Also, copy and paste doesn't work. You should know that.and

GitHubRulesOK commented 4 years ago

@rkingett I think you misread the support info. The developer does support and link to NVDA (on that help page) as well as mention Narrator but neither are able to read the surface textual image they cannot "read out" the contents of a comic book picture of words and the glyphs for text in a pdf are not always there to be copied to the system

The NVDA programmers should not be forced by law to use SumatraPDF as a screen image reader they have other issues to attend to.

I just re-tested NVDA and could use it to fully read the PDF Title and SumatraPDF pull down controls without any problem

I guess adding an image of the areas that can be read by NVDA within SumatraPDF is possibly not of assistance to you but I am showing to others that all of the area around the image canvas is accessible and can be read out loud by either Narrator or NVDA and include the description of OpenBook software that can be integrated to read either scanned contents such as Magazines or PDF file contents. I also illustrate just one of many applications that I have tested that can work within the SumatraPDF canvas area to read out either the underlying text or can do very efficient OCR on image based (scanned) P D Fs

image

I have recently in the forum illustrated QTranslate (A free QuestSoft Application) which can read screen content amongst many other tasks and here I am illustrating Capture2Text (from http://capture2text.sourceforge.net) which is a very lightweight powerful OCR reader addition.

SumatraPeter commented 4 years ago

Let's ignore image-based PDFs for the moment. What I want to know is, what is technically preventing screen readers from being able to access the content of text-based PDFs (and possibly other supported ebook formats) displayed by Sumatra? Hopefully @kjk can elaborate on what the problem is, and whether it would require massive rework to be able to improve matters in terms of accessibility.

kjk commented 4 years ago

For standard windows controls (like buttons or menus) Windows provides the necessary code to support accessibility features used by screen readers etc. Everything else (like PDF text rendered on screen) needs custom code. This code is not easy to write and there is a lot of other code that needs to be written.

cary-rowen commented 4 years ago

Unfortunately, the developer does not want this to become accessible to NVDA, and similar. Unless there's a law we can use to force this reader to be accessible, it's gonna be a very tough fight. Also, copy and paste doesn't work. You should know that.and

Copying content to the clipboard to make screen readers read aloud would be a very bad experience. Visually impaired users do not only need to know the pure text content, rich formatting information is more useful.

GitHubRulesOK commented 4 years ago

@man0528 I agree scraping scanned text needs dedicated software to indicate emphasis such as quoted / braced / italics / emboldened / color / tabular contents etc. and many discussions abound around how that audio descriptive content could detract from a plain audio stream. I have no idea how the commercial products such as OpenBooks deal with that aspect. I am simply suggesting that TTS screen scraping via the clipboard is better than "nothing"

My understanding is that applications can capture the canvas area and process it themselves thus NVDA developers could do likewise and add the descriptive features they deem necessary.

For assistive web tagged PDF reading the development is driven by Adobe Labs and for their tagged PDF's there is nothing else to beat Acrobat reader especially when combined with Narrator to read out the surrounding application interface controls. I tested several other VIP dedicated Readers and all the free options that I tested failed to deliver one way or another. SumatraPDF with Narrator was often better at getting access to the desired paragraph even if it then needed to be copied into the TTS reader.

If your looking for a low cost WAC approved assisted e-book reader there is Optimilia Studios Read-Aloud via the Microsoft Store

SumatraPeter commented 4 years ago

Everything else (like PDF text rendered on screen) needs custom code. This code is not easy to write and there is a lot of other code that needs to be written.

Unless the code has to be written at the Sumatra level, clearly it's not a high priority even for a fairly large organization like Artifex that is responsible for MuPDF.

GitHubRulesOK commented 6 months ago

On Windows the inbuilt Screen Reader is MS Edge (currently 2024 is vIII+) which does a very good job working with both PDF and ePub plus many others but only PDF has good Natural Reading.

Here reading this Page as a PDF.

image

However it does not attempt ePub image

A very capable audio reader is Balabolka which converts many file types to text and can read them different ways.

I have explained many times SumatraPDF can easily send the filename and or page number to any suitable powerful external reader via command line. Edge TTSreader Balabolka and many others can be used to do convert documents into text for TTS.

There are also many screen scrapers that can fill the gap between pixel render and speech by use onscreen OCR.

image

cary-rowen commented 6 months ago

@GitHubRulesOK

On Windows the inbuilt Screen Reader is MS Edge (currently 2024 is vIII+) which does a very good job working with both PDF and ePub plus many others but only PDF has good Natural Reading. I would like to correct that MS Edge is not a built-in screen reader in Windows. It’s just that the browser has a feature that reads web pages or PDF text aloud. Windows' built-in screen reader is called Narrator, and you can turn it on or off by pressing Windows+Ctrl+Enter.

I don't think the reporter's request was for the project to support reading pdf documents consisting entirely of images.

@rkingett wrote:

I am a blind user who is using the free screen reader NVDA. I would really love to use this program but the documents, even though they are text documents, will not read at all with NVDA.

I haven't seen any accessibility efforts from sumatrapdf in all these years

GitHubRulesOK commented 6 months ago

@cary-rowen Not looking for a bun fight I accept your correction that Windows Screen Reader is Narrator, but AFAIK it can not access PDF data. If it can it would be interesting to test the results.

Raw PDF text is NUMERIC (often out of human order) and needs conversion into a semblance of text strings.

However the Windows PDF reader is MS Edge and in the past would take PDF reader control away from any other 3rd party! I have not tested NVDA or narrator in Edge to give a comparison, perhaps I should. My initial tests show narrator in Edge is exactly same as SumatraPDF it can read any popup or surrounding data but cannot access the encoded PDF . However NVDA does seem to work well in Edge (PDFium) and I guess other browsers.

image

Acrobat has a built in heuristics based text extractor, designed especially for extracting text from PDF into a different accessible "reflow" format than is native inside most PDFs. You can "save as" such output as a text file. So does not attempt any document formats other than their own PDF [/A] XFA/XMP or otherwise restructured PDFs Much of that code comes from Acrobat editor (crippled in reader DC) and causes much of the slow/bloat. image image

It is not fully clear what adobe reader editor changes the file is converted from Linux to Mac format ! and a Mac Quartz header added !! so the file increases in size from 9 KB to 12 KB but that should make no real difference to it being accessible. I have to guess it is keeping other related NVDA data somewhere else.