mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.63k stars 10k forks source link

Text Extraction from pdf with pdfjs is inaccurate in Vue3 #15328

Closed Clastine closed 2 years ago

Clastine commented 2 years ago

Attach (recommended) or Link to PDF file here:

Configuration:

Steps to reproduce the problem:

  1. Render all pages of pdf with Text layer

What is the expected behavior? (add screenshot) to extract highlighted text accurately with windows.getSelection

What went wrong? (add screenshot) highlighted text extracted text when I extract the highlighted text from pdf with pdfjs using window.getselection the start position and end position is few characters after and before and also selecting the white space.

Snuffleupagus commented 2 years ago

Attach (recommended) or Link to PDF file here:

This part of the issue template is required when opening an issue, since as-is it's unfortunately not actionable.

PDF.js version: pdfjs-dist: 2.14.305

That version is no longer supported, please find the latest releases at https://mozilla.github.io/pdf.js/getting_started/#download

and vuejs 3

Sorry, but we don't know anything about "vuejs 3" (since we don't use it here) and thus cannot provide any help/support for it.

Steps to reproduce the problem:

  1. Render all pages of pdf with Text layer

Please see https://github.com/mozilla/pdf.js/blob/master/.github/CONTRIBUTING.md (emphasis mine):

If you are developing a custom solution, first check the examples at https://github.com/mozilla/pdf.js#learning and search existing issues. If this does not help, please prepare a short well-documented example that demonstrates the problem and make it accessible online on your website, JS Bin, GitHub, etc. before opening a new issue or contacting us in the Matrix room -- keep in mind that just code snippets won't help us troubleshoot the problem.