mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.36k stars 9.97k forks source link

Issue with Inaccurate Text Highlighting in PDF Search Using ng2-pdfjs-viewer - Version 13 #17154

Closed rajnish21a closed 1 year ago

rajnish21a commented 1 year ago

Configuration:

Web browser and its version: Google Chrome Operating system and its version: Windows 10 and above PDF.js version: v3.11.174 Is a browser extension: No Application Platform: Angular 13 PWA Issue Description: I am currently using ng2-pdfjs-viewer version 13 within my application, and overall, it has been working smoothly. However, I have encountered a specific issue when searching for text strings, such as "A10," within PDFs generated using the E3 series.

The problem is that while the search and highlighting functionality generally work correctly for "A10," it also highlights some additional, unintended instances, such as "A/10." This behavior is incorrect; it should only highlight "A10" and not variations like "A/10." I've noticed that similar issues occur when there is a space in between text in the PDF, causing the search to highlight unwanted portions of text.

It's worth noting that these issues are not present when using popular PDF readers like Adobe Acrobat. Upon further investigation, I realized that PDF.js, the PDF reader used by ng2-pdfjs-viewer, and other PDF readers interpret text layers differently, which appears to be the root cause of these inconsistencies.

I would greatly appreciate any assistance or guidance on how to address this issue, as it impacts the accuracy of text highlighting within PDFs generated by the E3 series. Unfortunately, I am unable to provide the PDF for reference, but I am eager to work towards a solution to improve the text highlighting accuracy.

Also when Whole Word Search it only highlights A10 but (A10) is also a Whole Word in it is highlighted in any other PDF reader but not here.

I am duly attaching a test pdf. Search for 14A to replicate the issue.

Thank you for your understanding and support in resolving this matter. Test_PDF.pdf

Snuffleupagus commented 1 year ago

Duplicate of #17007, where you were informed that "ng2-pdfjs-viewer version 13" is not maintained and/or supported here. Hence you'll need to use the PDF.js library directly, without another project on top, in order for us to be able to provide any help.

The problem is that while the search and highlighting functionality generally work correctly for "A10," it also highlights some additional, unintended instances, such as "A/10."

When searching for "A10" in your attached PDF document, using either Adobe Reader or the PDF.js default viewer, there are no matches for that string. Hence your provided STR (Steps To Reproduce) are unfortunately not possible to follow, nor particularly easy to read unfortunately. (Please note that our ISSUE_TEMPLATE asks a list of steps, since that's much easier to follow than "just" text.)

I am duly attaching a test pdf. Search for 14A to replicate the issue.

Searching for "14A" in that PDF document works just fine when using the PDF.js default viewer.


All-in-all, I'm sorry to say that this issue unfortunately doesn't look actionable/valid as-is. There's a bunch unrelated/confusing information provided, making it very difficult to understand what you're trying to report.