nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
https://www.nvaccess.org/
Other
2.11k stars 637 forks source link

pdf: NVDA skips private use areas characters #5562

Open surfer0627 opened 8 years ago

surfer0627 commented 8 years ago

NVDA skips some symbols: the numeric values are (e18c, e18d, e18e, and e18f) This case occurs in NVDA 2014.1 and later version. Please use the attachments to test it. symbols.docx symbols.pdf

case1: environment: NVDA2015.4 installed interface language: English Synthesizer: eSpeak Adobe Reader: XI

STR:

  1. Open file "symbols.docx"
  2. Press down arrow, NVDA reports "symbol2 b"
  3. Open file "symbols.pdf"
  4. Press down arrow, NVDA reports "symbol2"
  5. Press right arrow several times to move to "colon:"
  6. Press right arrow twice, NVDA reports space (NVDA skips a symbol e18d.)

notes: In NVDA2012.2, the symbols could be detected.

case2: environment: NVDA2012.2 portable interface language: English Synthesizer: eSpeak Adobe Reader: XI

  1. Open file "symbols.pdf"
  2. Press right arrow several times to move to "colon:"
  3. Press right arrow twice, NVDA reports nothing. (This is because here is a symbol e18c)

notes: • I could not find NVDA2013.x, so I'm not test it. • NVDA_2012.2 (portable) could be downloaded at https://dl.dropboxusercontent.com/u/90288447/nvda2012.2.1.rar

surfer0627 commented 8 years ago

Sorry, I don't know how to attach files. I select them and submit. But, I could not see any file link in this page.

surfer0627 commented 8 years ago

According to the investigation from users, now, NVDA reads pdf files and skips "private use areas" characters.

notes: In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium.

definition resources: https://en.wikipedia.org/wiki/Private_Use_Areas

jcsteh commented 8 years ago

Because this is the private use area, it is by definition impossible to have standard mappings for them. Therefore, there's nothing useful we can do here.

surfer0627 commented 8 years ago

Is it possible that NVDA could detect the character like NVDA2012.2 do?

If users know that there is a private use area character. They could find one sighted person for help.

jcsteh commented 8 years ago

Sorry. I misunderstood what you were asking for. I thought you were expecting these symbols to have proper names, which is impossible. However, the fact that they aren't present at all recently is another issue entirely.

jcsteh commented 8 years ago

I can confirm this. Change was introduced by 788cefb (#2963, NVDA 2014.1).

@MichaelDCurran: This commit does two things:

  1. VBufBase's nodeHasUsefulContent: rather than calling isWhitespace, write out a for loop directly, and return true if any character that is not whitespace (iswspace) or is from the private use range or 0-width space (isPrivatecharacter) is found.
  2. VbufStorage_buffer_t::addTextFieldNode: strip private characters from the start and end of the text string if they exist when giving the text to the new text node.

2) is the issue here, as it filters out text nodes which only contain a private Unicode character (which is unfortunately how things tend to get rendered in PDF). The question is: why do we need 2)? 1) should cause browsers to fall back to the label as required because nodeHasUsefulContent will return false. Were you just trying to get rid of pointless nodes or can you remember whether there was some other reason for this?

michaelDCurran commented 8 years ago

I guess it was that when we fall back to a label, it is appended, rather than replacing the content. for example, a button with a private use char would then get rendered as the private use char + the label. At the time that looked funny.

However, if it breaks something, then there is no technical reason I can think of why it needs to be removed.

surfer0627 commented 8 years ago

symbols.docx

surfer0627 commented 8 years ago

symbols.pdf

bhavyashah commented 7 years ago

@jcsteh https://github.com/nvaccess/nvda/issues/5562#issuecomment-162412558 suggests that you are able to successfully reproduce the reported issue and are aware of the causative factors of this regression. Could you and @surfer0627 please check if this bug still stands in the latest version of Acrobat Reader?

surfer0627 commented 7 years ago

@bhavyashah: I could still reproduce this in Adobe Acrobat Reader DC 17.012.20093 - Chinese Traditional.

Adriani90 commented 5 years ago

I can still reproduce this issue in NVDA alpha-16768,a6f7fb40 with Adobe reader 19.010.20091

surfer0627 commented 4 years ago

Now, NVDA 2019.3.1 released.

Then, we still need to use version 2013.3 to read private use area characters in pdf while using acrobat reader.

Is it possible to have a try build to fix this issue temporarily?

Actually, I do not know how to do.

(git revert 788cefb) or something else.

Thank you for all of your help.

feerrenrut commented 4 years ago

This is more complicated than just reverting the change. It's hard to say whether this is a regression or not, since this was initially changed for #2963. I found the description of this issue hard to follow, I'll attempt to describe it in my own words:

While reading a PDF and encountering a "private use character" without a label, something is reported to be able to detect the characters presence so that the user can ask for help.

However it seems that if we fixed this in the way suggested by jcsteh's comment we will end up with noise being added to cases where a label exists. Ideally there is a label that should replace these characters, and they don't have to be rendered.

Adriani90 commented 1 year ago

Suggestion: use speech refactor feature to add a beep or a short sound that indicates a PUA symbol. However, this should apply only in the PDF virtual document, in MS Word PUA bullets for example are mapped to unicode, so NVDA would not report these anymore if we change the behavior for Microsoft word as well.