mgmeyers / pdfannots2json

GNU Affero General Public License v3.0
42 stars 5 forks source link

Accented characters do not appear in comments #20

Open pho-souza opened 1 year ago

pho-souza commented 1 year ago

Hi,

I noticed a problem when extracting the notes. Special characters do not appear in the "comment" field, only in "annotatedText".

In the file I used, which is attached below, characters with accents, such as "é um formato de arquivo" e "padrão" appear in the "anotattedText" field of the resulted json file.

On the other hand, the "comment" field does not extract any special characters, such as "ã", "ç" and "á", in the comments. They are ignored in the file.

Screenshot of annotated PDF


Here is the PDF file and the json (in txt) generated by pdfannots2json.