Highlighter Can't Highlight All the Text in the Document

neuml / txtmarker

Highlight text in documents

Apache License 2.0

73 stars 11 forks source link

Thank you for reporting the issue.

From the sounds of it, you used txtai to extract text from a PDF document and now are trying to highlight text like this example - https://github.com/neuml/txtmarker/blob/master/examples/02_Highlighting_with_Transformers.ipynb ?

Hard to tell exactly but it's possible that there is more that needs to be done to clean out control characters from the extracted text.

I've found the best way to debug issues like this is to start with small piece of the result text (like a single word) and then iteratively add text in until you reproduce a non-match that should have been a match. That will most likely expose characters to clean out that will allow a full match.

neuml / txtmarker

Highlighter Can't Highlight All the Text in the Document #8