stephanrauh / ngx-extended-pdf-viewer

A full-blown PDF viewer for Angular 16, 17, and beyond
https://pdfviewer.net
Apache License 2.0
450 stars 168 forks source link

PDF Viewer Text Selection Bug #2277

Closed waabri closed 2 months ago

waabri commented 2 months ago

Hello, we would like to report an issue that my colleague and I discovered yesterday regarding text selection on PDF Viewer, when selecting a row of texts, it jumps into other parts of the document or selects the entire texts on the page.

https://github.com/stephanrauh/ngx-extended-pdf-viewer/assets/166471475/a395cc0b-a64f-4cda-b92e-3afec0cddb95

When we tried doing text selection on other browser's native pdf viewer, it works fine.

https://github.com/stephanrauh/ngx-extended-pdf-viewer/assets/166471475/7ddee193-98d6-4c58-b6bb-9ded9e44566e

Thank you!

waabri commented 2 months ago

Digital_document(2)(1).pdf I forgot to include the file haha, my bad

stephanrauh commented 2 months ago

The PDF viewer has problems with your document. The root problem is that the text layer uses the font available in the browser (and know to pdf.js). These fonts are different from the native PDF fonts, so there's always a small mismatch. However, the pdf.js team have tweaked their algorithm to the point you'll hardly ever notice.

Open your document in the text layer demo (https://pdfviewer.net/extended-pdf-viewer/textlayer) and activate the toggle " put each text layer into a box". Then you'll see the <span> tags you select when marking a text:

image

The "select everything" effect happens when the mouse is between the <span> tags. I've put some of these gaps in a blue circle.

stephanrauh commented 2 months ago

Here's an issue that may shed some light on the topic: https://github.com/stephanrauh/ngx-extended-pdf-viewer/issues/694#issuecomment-800225381

In the meantime, the old "textLayerMode=2" became the default, was replaced/renamed later (?) by textLayerMode=1, and to make the confusion perfect, a year ago a new textLayerMode=2 has been implemented with a different meaning.

I'm closing this ticket now because there's nothing I can do. However, you can send your PDF file to the pdf.js team and ask them to improved either marking or text layer alignment. Just make sure you don't mention my project, because pdf.js only accepts bug that show in Firefox.

Good luck! Stephan