mgmeyers / pdfannots2json

GNU Affero General Public License v3.0
45 stars 6 forks source link

Certain letters are constantly (reproducibly) missing when extracting #22

Open sirlaughalot90 opened 10 months ago

sirlaughalot90 commented 10 months ago

When extracting text there are certain combinations of letters which are not being extracted properly, e.g. Th (as in the), and 'fi' are particular troublesome.

CleanShot 2023-12-12 at 11 09 06@2x

sirlaughalot90 commented 9 months ago

Hello @mgmeyers, hate to bother you, just want to make sure that the issue has been registered (because it is a constant interruption in my workflow...). Thank you!