ocrmypdf / OCRmyPDF-EasyOCR

OCRmyPDF EasyOCR plugin
MIT License
34 stars 6 forks source link

Newlines missing in sidecar #4

Open BillyCroan opened 5 months ago

BillyCroan commented 5 months ago

I installed this easyocr version via pipx and I went to compare a bunch of files between the original ocrmypdf and this one, and found that while easyocr is WAY more accurate at getting the letters right, the sidecar is all one line. Less than ideal and sounds like a bug to me.

If I pdftotext the pdf, it comes out on multiple lines. But the sidecar is jacked.

to reproduce, use --sidecar I can provide a jpg for sure if you want.

jbarlow83 commented 5 months ago

The output format from easyocr doesn't really have line group, so that information has to be inferred. Using pdftotext -layout should give an accurate reconstruction.