mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.48k stars 9.98k forks source link

PDFJS highlight is not correct. #9306

Open unicorn82 opened 6 years ago

unicorn82 commented 6 years ago

Attach (recommended) or Link to PDF file here: 10pages-noLogo.pdf Configuration:

Steps to reproduce the problem:

  1. Open PDFJS demo, https://mozilla.github.io/pdf.js/web/viewer.html
  2. Click "Open File" button in header and open PDF attached from local
  3. Search "engine" and highlight all occurrences.
  4. Some keyword cannot be highlighted correctly, for example in page 6. What is the expected behavior? (add screenshot) Keyword need be highlighted correctly.

What went wrong? (add screenshot) pdfjs_highlight_issue

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

unicorn82 commented 6 years ago

Any comments or suggestion for this issue?

timvandermeij commented 6 years ago

This has to do with the positioning of the text layer. I tihnk there are some similar issues regarding this behavior, but someone needs to check if they share the same cause.

Ognyshare commented 6 years ago

I modified in text_layer_builder.js in the convertMatches functions two while loops.

from:

iIndex += textContentItemsStr[i].length;

to:

iIndex += this.findController.normalize(textContentItemsStr[i]).length;

Since this change I could resolve a simmilar Problem, but it is not perfect because in some cases the match is still shifted one position to the right.

Here is the modified function which I use now in text_layer_build.js : modified_convertMatches.txt

timvandermeij commented 6 years ago

I think this is related to #9448 then.