Open james-s-w-clark opened 5 years ago
Hello :) I need to do the same thing Where you able to find solution for this? Would be really grateful if you could help Kind Regards
You could let tesseract treat this as two single-column images by splitting the original image
Environment
Current Behavior:
Ordered/unordered lists of growing lengths affect other column + bullet points in two-column image. This is with
--psm 1
&-l eng
Input 1:
And a slightly different Input 2:
Expected Behavior:
Tesseract should segment the text into two columns, and: 1) identify all the bulletpoint numbers (in both columns), 2) identify the text on lines even with little text (maybe too sparse for recognition?). It seems that 4 characters are needed on a line (but then, the two-line bullet 1. under section 5. should be readable).
Suggested Fix:
I don't have a suggestion for this.