Closed M3ssman closed 8 months ago
Need to rename flag -m
, --mode-sequential
to -s
, --sequential-mode
Still buggy at first sight: appends mappings for each call like this:
--parameter='{"model": "fas+fas+fas+fas+fas+fas+fas+fas+fas+fas+fas+fas+fas",
"textequiv_level": "glyph", "dpi": 0, "padding": 0, "segmentation_level": "word", "overwrite_segments": false, "overwrite_text": true, "shrink_polygons": false, "block_polygons": false, "find_tables": true, "find_staves": false, "sparse_text": false, "raw_lines": false, "char_whitelist": "", "char_blacklist": "", "char_unblacklist": "", "tesseract_parameters": {}, "xpath_parameters": {}, "xpath_model": {}, "auto_model": false, "oem": "DEFAULT"}'
Description
Currently, if run in local mode without any METS/MODS-metadata, ODEM expects image files to carry information about their preferred language/model.
While this approach fits for a large GT-corpus like ODEM with lots of different languages and language combinations, it would be handy to provide a flag
-l
,--languages
for the commandline API to set language information for a specific directory explicitly, rather than rename image files to fit the original approach. This should be combined with a second flag-m
,--mappings
to also inject model mapping information at runtime.Further, this allows to keep the image files as they are, and enables mixed modes of evaluation on the fly. So one could use both existing information (from filename or metadata, if present) and enforce different languages and mappings, which are unknown forehand for evaluation purposes in training scenarios.