Closed Shreeshrii closed 6 years ago
USAGE: classifier_tester [.tr files ...]
--debug_level Level of Trainer debugging (type:int default:0)
--load_images Load images with tr files (type:int default:0)
--clusterconfig_min_samples_fraction Min number of samples per proto as % of total (type:double default:0.625)
--clusterconfig_max_illegal Max percentage of samples in a cluster which have more than 1 feature in that cluster (type:double default:0.05)
--clusterconfig_independence Desired independence between dimensions (type:double default:1)
--clusterconfig_confidence Desired confidence in prototypes created (type:double default:1e-06)
--classifier Classifier to test (type:string default:)
--lang Language to test (type:string default:eng)
--tessdata_dir Directory of traineddata files (type:string default:)
--configfile File to load more configs from (type:string default:)
--D Directory to write output files to (type:string default:)
--F File listing font properties (type:string default:font_properties)
--X File listing font xheights (type:string default:)
--U File to load unicharset from (type:string default:unicharset)
--O File to write unicharset to (type:string default:)
--output_trainer File to write trainer to (type:string default:)
--test_ch UTF8 test character string (type:string default:)
from https://github.com/tesseract-ocr/tesseract/blob/master/training/classifier_tester.cpp
// This program has complex setup requirements, so here is some help:
// Two different modes, tr files and serialized mastertrainer.
// From tr files:
// classifier_tester -U unicharset -F font_properties -X xheights
// -classifier x -lang lang [-output_trainer trainer] *.tr
// From a serialized trainer:
// classifier_tester -input_trainer trainer [-lang lang] -classifier x
//
// In the first case, the unicharset must be the unicharset from within
// the classifier under test, and the font_properties and xheights files must
// match the files used during training.
// In the second case, the trainer file must have been prepared from
// some previous run of shapeclustering, mftraining, or classifier_tester
// using the same conditions as above, ie matching unicharset/font_properties.
//
// Available values of classifier (x above) are:
// pruner : Tesseract class pruner only.
// full : Tesseract full classifier.
// with an input trainer.)
USAGE: lstmeval [.tr files ...]
--max_image_MB Max memory to use for images. (type:int default:2000)
--debug_level Level of Trainer debugging (type:int default:0)
--load_images Load images with tr files (type:int default:0)
--clusterconfig_min_samples_fraction Min number of samples per proto as % of total (type:double default:0.625)
--clusterconfig_max_illegal Max percentage of samples in a cluster which have more than 1 feature in that cluster (type:double default:0.05)
--clusterconfig_independence Desired independence between dimensions (type:double default:1)
--clusterconfig_confidence Desired confidence in prototypes created (type:double default:1e-06)
--model Name of model file (training or recognition) (type:string default:)
--eval_listfile File listing sample files in lstmf training format. (type:string default:)
--configfile File to load more configs from (type:string default:)
--D Directory to write output files to (type:string default:)
--F File listing font properties (type:string default:font_properties)
--X File listing font xheights (type:string default:)
--U File to load unicharset from (type:string default:unicharset)
--O File to write unicharset to (type:string default:)
--output_trainer File to write trainer to (type:string default:)
--test_ch UTF8 test character string (type:string default:)
USAGE: lstmeval [.tr files ...]
Should it be .lstmf files?
USAGE: lstmtraining [.tr files ...]
--debug_interval How often to display the alignment. (type:int default:0)
--train_mode Controls gross training behavior. (type:int default:80)
--net_mode Controls network behavior. (type:int default:192)
--perfect_sample_delay How many imperfect samples between perfect ones. (type:int default:4)
--max_image_MB Max memory to use for images. (type:int default:6000)
--append_index Index in continue_from Network at which to attach the new network defined by net_spec (type:int default:-1)
--max_iterations If set, exit after this many iterations (type:int default:0)
--debug_level Level of Trainer debugging (type:int default:0)
--load_images Load images with tr files (type:int default:0)
--target_error_rate Final error rate in percent. (type:double default:0.01)
--weight_range Range of initial random weights. (type:double default:0.1)
--learning_rate Weight factor for new deltas. (type:double default:0.0001)
--momentum Decay factor for repeating deltas. (type:double default:0.9)
--clusterconfig_min_samples_fraction Min number of samples per proto as % of total (type:double default:0.625)
--clusterconfig_max_illegal Max percentage of samples in a cluster which have more than 1 feature in that cluster (type:double default:0.05)
--clusterconfig_independence Desired independence between dimensions (type:double default:1)
--clusterconfig_confidence Desired confidence in prototypes created (type:double default:1e-06)
--stop_training Just convert the training model to a runtime model. (type:bool default:false)
--debug_network Get info on distribution of weight values (type:bool default:false)
--net_spec Network specification (type:string default:)
--continue_from Existing model to extend (type:string default:)
--model_output Basename for output models (type:string default:lstmtrain)
--script_dir Required to set unicharset properties or use unicharset compression. (type:string default:)
--train_listfile File listing training files in lstmf training format. (type:string default:)
--eval_listfile File listing eval files in lstmf training format. (type:string default:)
--configfile File to load more configs from (type:string default:)
--D Directory to write output files to (type:string default:)
--F File listing font properties (type:string default:font_properties)
--X File listing font xheights (type:string default:)
--U File to load unicharset from (type:string default:unicharset)
--O File to write unicharset to (type:string default:)
--output_trainer File to write trainer to (type:string default:)
--test_ch UTF8 test character string (type:string default:)
USAGE: lstmtraining [.tr files ...]
Should it be .lstmf files?
USAGE: set_unicharset_properties
--debug_level Level of Trainer debugging (type:int default:0)
--load_images Load images with tr files (type:int default:0)
--clusterconfig_min_samples_fraction Min number of samples per proto as % of total (type:double default:0.625)
--clusterconfig_max_illegal Max percentage of samples in a cluster which have more than 1 feature in that cluster (type:double default:0.05)
--clusterconfig_independence Desired independence between dimensions (type:double default:1)
--clusterconfig_confidence Desired confidence in prototypes created (type:double default:1e-06)
--script_dir Directory name for input script unicharsets/xheights (type:string default:)
--configfile File to load more configs from (type:string default:)
--D Directory to write output files to (type:string default:)
--F File listing font properties (type:string default:font_properties)
--X File listing font xheights (type:string default:)
--U File to load unicharset from (type:string default:unicharset)
--O File to write unicharset to (type:string default:)
--output_trainer File to write trainer to (type:string default:)
--test_ch UTF8 test character string (type:string default:)
USAGE: text2image
--exposure Exposure level in photocopier (type:int default:0)
--resolution Pixels per inch (type:int default:300)
--xsize Width of output image (type:int default:3600)
--ysize Height of output image (type:int default:4800)
--margin Margin round edges of image (type:int default:100)
--ptsize Size of printed text (type:int default:12)
--leading Inter-line space (in pixels) (type:int default:12)
--box_padding Padding around produced bounding boxes (type:int default:0)
--glyph_resized_size Each glyph is square with this side length in pixels (type:int default:0)
--glyph_num_border_pixels_to_pad Final_size=glyph_resized_size+2*glyph_num_border_pixels_to_pad (type:int default:0)
--tlog_level Minimum logging level for tlog() output (type:int default:0)
--char_spacing Inter-character space in ems (type:double default:0)
--underline_start_prob Fraction of words to underline (value in [0,1]) (type:double default:0)
--underline_continuation_prob Fraction of words to underline (value in [0,1]) (type:double default:0)
--min_coverage If find_fonts==true, the minimum coverage the font has of the characters in the text file to include it, between 0 and 1. (type:double default:1)
--degrade_image Degrade rendered image with speckle noise, dilation/erosion and rotation (type:bool default:true)
--rotate_image Rotate the image in a random way. (type:bool default:true)
--strip_unrenderable_words Remove unrenderable words from source text (type:bool default:true)
--ligatures Rebuild and render ligatures (type:bool default:false)
--find_fonts Search for all fonts that can render the text (type:bool default:false)
--render_per_font If find_fonts==true, render each font to its own image. Image filenames are of the form output_name.font_name.tif (type:bool default:true)
--list_available_fonts List available fonts and quit. (type:bool default:false)
--render_ngrams Put each space-separated entity from the input file into one bounding box. The ngrams in the input file will be randomly permuted before rendering (so
that there is sufficient variety of characters on each line). (type:bool default:false)
--output_word_boxes Output word bounding boxes instead of character boxes. This is used for Cube training, and implied by --render_ngrams. (type:bool default:false)
--bidirectional_rotation Rotate the generated characters both ways. (type:bool default:false)
--only_extract_font_properties Assumes that the input file contains a list of ngrams. Renders each ngram, extracts spacing properties and records them in output_base/
[font_name].fontinfo file. (type:bool default:false)
--output_individual_glyph_images If true also outputs individual character images (type:bool default:false)
--text File name of text input to process (type:string default:)
--outputbase Basename for output image/box file (type:string default:)
--writing_mode Specify one of the following writing modes.
'horizontal' : Render regular horizontal text. (default)
'vertical' : Render vertical text. Glyph orientation is selected by Pango.
'vertical-upright' : Render vertical text. Glyph orientation is set to be upright. (type:string default:horizontal)
--font Font description name to use (type:string default:Arial)
--unicharset_file File with characters in the unicharset. If --render_ngrams is true and --unicharset_file is specified, ngrams with characters that are not in unicha
rset will be omitted (type:string default:)
--fontconfig_tmpdir Overrides fontconfig default temporary dir (type:string default:/tmp)
--fonts_dir If empty it use system default. Otherwise it overrides system default font location (type:string default:)
closing this as a duplicate of issue filed by @jbreiden - missing manpages for v4 training binaries #1297
No man pages are there for the following programs in https://github.com/tesseract-ocr/tesseract/tree/master/doc