stephanrauh / ngx-extended-pdf-viewer

A full-blown PDF viewer for Angular 16, 17, and beyond
https://pdfviewer.net
Apache License 2.0
473 stars 181 forks source link

Add a way of include exceptions on findMultiple method #1743

Closed feliperossitunts closed 1 year ago

feliperossitunts commented 1 year ago

Problem description: I needed to filter the highlighted words in some contexts, but I didn't find a simple way to do it. Example: if I search for the word "heal" and I dont want this word in the context "heal this". The highlight just ignore this word in this context.

Possible solution and alternatives I imagine something like this: findMultiple(text: Array<string>, exceptions?: RegExp | Array<string>, options?: FindOptions): boolean; Then, when I trigger the findNext() method, if the word fits in some exception, the word is ignored (even if it matches).

I hope I could make it clear. Thanks!

stephanrauh commented 1 year ago

Yes, I see where you're heading. Unfortunately, the find algorithm has evolved a lot in pdf.js, and my fork is locked in an outdated version because it's a hell of a merge conflict. So I'm not sure how to continue. I'd like to drop my implementation to be able to benefit from the progress of the pdf.js project.

Or maybe I won't, because you've reminded me of the programmatic API. That's a popular feature, so I don't want to loose that. Maybe I (and you all) just have to live with the sad fact that we don't benefit from the improvements.

In the meantime, I suggest you implement your feature yourself. It's easier than you might think.

If you're interested in the search results but don't care about showing them in the UI, use getPageAsText():

const pageNumber = 1;
const text = await this.pdfViewerService.getPageAsText(pageNumber);

If you're interested in displaying the search results, but don't care about the list of search results to be complete, examine the text layer:

<ngx-extended-pdf-viewer 
  [src]="..." 
  (textLayerRendered)="highlightSearchResults($event)">
</ngx-extended-pdf-viewer>
  public textLayerRendered(event: TextLayerRenderedEvent): void {
    const textLayer: Array<HTMLSpanElement> = event.source.textLayer.textDivs;
    // implement your find and highlight algorithm here
  }

This code snippet relies on the way pdf.js implements highlighting search results. The PDF file is simply a canvas, but pdf.js renders an invisible text layer about the canvas. The text layer contains the same text as the canvas, but possibly - usually - with the wrong font. It's only a close approximation to the real layout. But it contains the same text as the PDF file, so you can analyse the text layer, extract the text from it, and you can modify it by adding additional <span> tag. That's precisely what findMultiple() does. It wraps the search results in the text layer with a <span> and a CSS class highlighting the text.

stephanrauh commented 1 year ago

I'm closing the ticket now because I don't believe I'm going to implement this feature. However, I hope I've enabled you to do it yourself.

I'd like to hear from you. How to you like my idea? Does it work? If it doesn't, what can I do to help you to make it work?

Best regards, Stephan