Open awiebe opened 7 years ago
You can simply create a builder object yourself. You can have a look at https://github.com/jflesch/pyocr/blob/master/src/pyocr/tesseract.py#L57 for an example. Basically you just need to inherit from BaseBuilder
and define tess_conf = ["--load_system_dawg", "0"]
, file_ext = ['the_file_extension_that_tesseract_will_use']
, and the methods read_file()
and write_file()
.
If you implement such builder, feel free to send a pull request to include it in src/pyocr/tesseract.py.
I implemented a similar builder here, if it's helpful. In my case, I needed a modification to WordBoxBuilder
with dictionary-related parameters set to false.
This is a work in progress, since I may modify the parameters further - once it's complete, I'm happy to submit the builder as a pull request.
Would be great if you could just override tess_conf without having to extend the base builder
--load_system_dawg 0 would be helpful as an argument in image_to_text, perhaps as an options dictionary. Feel free to call it something that makes it language agnostic