PSM constants as defined in the ccstruct/publictypes.h are as below:
enum PageSegMode {
PSM_OSD_ONLY, ///< Orientation and script detection only.
PSM_AUTO_OSD, ///< Automatic page segmentation with orientation and
///< script detection. (OSD)
PSM_AUTO_ONLY, ///< Automatic page segmentation, but no OSD, or OCR.
PSM_AUTO, ///< Fully automatic page segmentation, but no OSD.
PSM_SINGLE_COLUMN, ///< Assume a single column of text of variable sizes.
PSM_SINGLE_BLOCK_VERT_TEXT, ///< Assume a single uniform block of vertically
///< aligned text.
PSM_SINGLE_BLOCK, ///< Assume a single uniform block of text. (Default.)
PSM_SINGLE_LINE, ///< Treat the image as a single text line.
PSM_SINGLE_WORD, ///< Treat the image as a single word.
PSM_CIRCLE_WORD, ///< Treat the image as a single word in a circle.
PSM_SINGLE_CHAR, ///< Treat the image as a single character.
PSM_COUNT ///< Number of enum entries.
};
The ones in the TessBaseAPI.java are as below:
/** Fully automatic page segmentation. */
public static final int PSM_AUTO = 0;
/** Assume a single column of text of variable sizes. */
public static final int PSM_SINGLE_COLUMN = 1;
/** Assume a single uniform block of text. (Default) */
public static final int PSM_SINGLE_BLOCK = 2;
/** Treat the image as a single text line. */
public static final int PSM_SINGLE_LINE = 3;
/** Treat the image as a single word. */
public static final int PSM_SINGLE_WORD = 4;
/** Treat the image as a single character. */
public static final int PSM_SINGLE_CHAR = 5;
Thus, the constant PSM_AUTO in java corresponds to PSM_OSD_ONLY in tesseract
C++ API, and to get the effect of AUTO, you either need to use
PSM_SINGLE_COLUMN or PSM_SINGLE_LINE from java code. This needs to be fixed.
Original issue reported on code.google.com by loni...@gmail.com on 1 Jul 2012 at 7:30
Original issue reported on code.google.com by
loni...@gmail.com
on 1 Jul 2012 at 7:30