Closed dashakureru closed 1 year ago
Quite literally the highlight is not wrapping the right part of the string, right number of characters though. So at least we know it's not just offset visually in a weird way. My guess is that some indexing change happened? Though this might be a pdf.js issue?
Oh no! I'm already familiar with this kind of bug... it always amounts to a difficult debugging session. Maybe you can give me a hint. Do you happen to know which version introduced the bug?
Just in case you want to dig into the code yourself: a likely candidate is the normalize
method. It has changed recently to accomodate several non-latin languages. It's possible that these changes have side effects I didn't consider when merging.
Hi! sorry it took a while to get back. I'm not sure yet which version introduced the bug but I used 16.2.5. I'll try to find the last working version.
Hi again! last working version is 16.1.0
So it's the update from pdf.js 3.3 to 3.4. Thanks! I'm sure that'll help me.
@korydondzila Kory, thanks for adding your insight in this and several other issues. I appreciate this a lot!
Interesting - looking for "James Boyle" shows the correct result, but "enclosing the commons" is four characters off.
Your bugfix has landed with version 16.2.16.
Enjoy! Stephan
Looks like this issue wasn't completely fixed. Works for a few words, but the highlight is still off for certain matches
What? Oh, wait - I didn't update the showcase. Maybe that's the problem? On my machine, it looks better:
@avinashgazula
I tested the latest version in my application and it seems like it works for some words and is off by a few characters for some. I've searched for "financial" here
The PDF viewer renders the text as an image, and adds an invisible text layer above it. The text layer is used for highlighting search results and for marking text. The problem is that both layers are rendered independently. For technical reasons, they usually aren't identical. For example, you computer probably doesn't have the font the PDF file needs. So pdf.js is using a lot of heuristics and guesswork to provide a good match. Most of the time, this works, but sometimes, it doesn't. Being off half a character or even more isn't unusual.
This means that I only consider it an error if you can show me in the HTML code that the wrong text is marked. Kory demonstrated the idea nicely in her comment above. Open the developer tools, find the highlighted text in the DOM, and check whether the <span>
responsible for highlighting covers the correct text or not.
Thanks in advance Stephan
Looks like the span is applied to the right characters. It's just off by a few characters on my pdfs
OK, I'm closing the issue again. I suppose you can reproduce the remaining issue on https://mozilla.github.io/pdf.js/web/viewer.html. If so, please file an issue at https://github.com/mozilla/pdf.js/issues.
When you do so, please be aware that the pdf.js team receives an incredible number of issues every day, so they're triaging strictly. Fill their bug report form meticulously. And keep in mind that pdf.js is the PDF viewer of Firefox. I'm using it to display PDF files in Angular, and they tolerate that, but only to a certain point. If you mention ngx-extended-pdf-viewer, they'll close the ticket without even looking at it. Report a Firefox bug, and you'll probably be fine.
Describe the bug When using search feature from toolbar, the highlighted area is incorrect. (same issue when I tried the PDFs in the documentation)
Version info
Desktop (please complete the following information):
Smartphone (please complete the following information):
To Reproduce
here's another example from this document https://pdfviewer.net/extended-pdf-viewer/pages-loaded
Screenshots See above :)
Additional context None :)
Thank you! I know from experience how much work filling a form like this is. It's always tedious and annoying. But it helps me to focus on the important points and to speed up development. So thank you very much for your understanding and your patience!