Should We Implement PSM (page segmentation mode) Into Tesseract as an Enum Option

madmaze / pytesseract

A Python wrapper for Google Tesseract

Apache License 2.0

5.84k stars 721 forks source link

Instead of having this set via the config should this be an option that is set in the parameters of the pytesseract command as a sort of Enum variable?

Link to PSM documentation Tesseract documentation for PSMs:

0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

madmaze / pytesseract

Should We Implement PSM (page segmentation mode) Into Tesseract as an Enum Option #441