Closed davidfaraci closed 3 years ago
Some initial thinking. This probably factors into two things:
1 is probably not too bad in principle. We process each page to text, using const page = await pdf.getPage(...)
and page.getTextContent
, and then munge the associated text a bit (merge all lines, remove end-of-line hypens, maybe more) to help the search hit its target. We could even probably display search results in a sidebar.
For 2, we probably want to use: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/mark. Multiline matches and so on might add quite a bit of complexity to this though.
And actually, it looks like even the official pdfjs viewer doesn't do multiline matches in search, so maybe that particular feature is not a good target for an initial implementation.
This is pretty much complete, including multiline.
Self-explanatory!