mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.34k stars 9.97k forks source link

Highlighted searched text not aligned properly #7876

Closed bhaskargoyal closed 3 years ago

bhaskargoyal commented 7 years ago

Link to PDF file (or attach file here): online demo

Configuration:

Steps to reproduce the problem:

  1. Open the demo site, as stated in README.md
  2. Click on Search icon.
  3. Search "the".
  4. Click on "Highlight all" option.
  5. You may see the misalignment of highlighted text.

What is the expected behavior? (add screenshot) Expected behavior should be to highlight the searched text without any misalignment.

What went wrong? (add screenshot) capture

Issue related to the sreenshot

You can see that as the x coordinate increases within the page, the selected text "the" and the highlighted one, gets further apart. As if it is a function of x coordinate.

timvandermeij commented 7 years ago

This has improved quite a bit after #7879, but it's not perfect yet.

ghost commented 7 years ago

I just have noticed the same problem while trying to figure out how to use search param.

You can reproduce it with the currently deployed demo (use 1.6.47?) : https://mozilla.github.io/pdf.js/web/viewer.html#search=Specialization

Most of occurances are ok but a few of them it looks like the hightlight is off by one character (in Chrome and Firefox, it seems to be ok in Explorer).

Edit : I wanted to show the issue to a coworker and the problem is not visible on his computer using the very same version of Chrome (55.0.2883.87), browser zoom at 100%. So I will attach a image a exemple of what I see on my computer, see the second occurance at the bottom of the image : hightlight

ghost commented 7 years ago

Side note :

This is not the case with the "Trace-based... " pdf sample but I have observed other hightlight shift (vertical by exemple) but these cases seems to be result of the OCR technology that was used to scan the source images of books (so really no much that can be done as it's impossible for the OCR to reproduce the font in all case).

In this sample, we can also see the shift visible in Adobe Acrobat Native (the ghosting of text in PDF.js in the right sample is another indirectly related issue and occur when we use the browser text search instead of pdf.js text search) :

capture_ecran 01-30-17 at 11 37 am

urmary commented 6 years ago

Hi, Is anyone looking at this issue? This issue becomes more apparent if font is changed to MS Gothic. I searched for word "the" and selected highlight all. Below is the screen shot showing results

gothic_search for the

xaviervansteene commented 5 years ago

I had the same problem. I set 'text-spacing' to 'initial' of textLayer class in order to be sure text is not extended. It works well for me

retrazil commented 5 years ago

This issue still persists for online demo.

timvandermeij commented 3 years ago

Closing since this is fixed after recent search and text layer fixes.