opentower / populus-viewer

A Social Annotation Tool Powered by Matrix
https://opentower.github.io/populus-viewer
GNU Affero General Public License v3.0
120 stars 7 forks source link

Search within PDF #60

Closed davidfaraci closed 3 years ago

davidfaraci commented 3 years ago

Self-explanatory!

gleachkr commented 3 years ago

Some initial thinking. This probably factors into two things:

  1. finding a list of matches in the pdf text, with associated page numbers; and
  2. hightlighting the n-th match on a page, as you tab through the matches.

1 is probably not too bad in principle. We process each page to text, using const page = await pdf.getPage(...) and page.getTextContent, and then munge the associated text a bit (merge all lines, remove end-of-line hypens, maybe more) to help the search hit its target. We could even probably display search results in a sidebar.

For 2, we probably want to use: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/mark. Multiline matches and so on might add quite a bit of complexity to this though.

gleachkr commented 3 years ago

And actually, it looks like even the official pdfjs viewer doesn't do multiline matches in search, so maybe that particular feature is not a good target for an initial implementation.

gleachkr commented 3 years ago

This is pretty much complete, including multiline.