Closed stccorp closed 3 years ago
I was able to do it in a very simple way. In case anyone is interested. For some reason I was not getting reliable output when sending binary object instead of file. I have no idea why at this time. But creating files, and sending one file at a ocr1.txt
time worked
yeah, cropping the image before sending it to tesseract is a good solution!
another way would be to recognize everything, but use the hocr()
option and select only the text present in the desired coordinates.
https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html#hocr-output
Hello, I am hoping someone can point me in the right direction. I want to be able to extract information from an image or pdf by specifying a bounding box in pixels (ex, x1,y1, x2,y2) for each field. Does this library allows something like that?
Thank you