mikeligthart / annotated_pdf_analyzer

Preprocess and analyse annotated pdfs for the Robotstories project.
GNU General Public License v3.0
0 stars 0 forks source link

Align highlighted text with text from comments. #1

Open mikeligthart opened 8 months ago

mikeligthart commented 8 months ago

Problem: currently the text from the highlights and comments are paired without a check. Not every highlighted text has a comment, so there are bound to be misalignments.

The output of _extract_annotations_in_pdf is a list of tuples. It should, for example, look like this when there are three highlights, but only the first and last highlight has a comment attached to it: [(highlighted_text_1, comment_1), (highlighted_text_2, ), (highlighted_text_3, comment_3)].