Open robertknight opened 10 months ago
hey @robertknight thanks for creating ocrs rust crate, im using it for detecting text from card ( for eg- licenses, etc), but the detection is taking more time 70-120 seconds, i have tried downscaling (30-40 seconds), but can do only upto certain extent because after that the text is not detected accurately, as the original image of card is small already. is there any way i can speed up the detection process ?
Can you provide an image(s) that are representative of the ones that you are trying to extract data from, along with some details of the system you are running the extraction on (what CPU? how many cores? etc.)? Make sure not to include identifiable information for a real person.
How many images are you processing in total in the time period that you quoted?
i have 16 gb ram i7 cpu, 8 cores, and processing on one image
That single image takes 850 milliseconds on my i5 laptop with ocrs image.png
. Are you using a release build, or at least building the rten-*
dependencies in release mode? Debug builds of those crates will be extremely slow in comparison.
yes it is super fast with cli, i am trying with library , just saw #7, thanks for the help @robertknight
As an example, I ran
ocrs
on an invoice I'd received from a tradesman recently. The photo of the invoice was 2479 x 3337 pixels andocrs
takes about 1.5s to process it on my Intel Mac. Downsizing to 30% of the original input size produces the same extracted output but runs nearly twice as fast (800-900ms).
Is there a known ideal image size or target text height? I'm testing on single line of text images and found that I can vertically concatenate them and run detect/recognize on the concatenated image for a huge speed gain. I can concatenate about 30-40 slices before the recognition accuracy starts to drop so I imagine at a point I'm causing some internal scaling when the image is passed to the model which then drops detection or accuracy.
Looking at some of the model code, am I correct in thinking that the expected image size for detection is 800x600 and the individual detected lines will be scaled to 64px high for recognition?
Looking at some of the model code, am I correct in thinking that the expected image size for detection is 800x600 and the individual detected lines will be scaled to 64px high for recognition?
That's correct. The input is padded to 800x600 if smaller or resized down if larger. In future I'd like to avoid the fixed input size for the detection model, as it is wasteful for small images. In the meantime vertically stacking small images is a good trick for better efficiency.
Excellent, thanks! Really appreciate this project and excited to explore using RTen with custom ONNX models too.
Input images from cameras etc. often have a much higher resolution than is needed to read the text. Downscaling the image can often produce the same output in much less time. This is because all of the steps in the pipeline that work directly on the input image have a lot less memory to move around and less computation to do if it is smaller.
As an example, I ran
ocrs
on an invoice I'd received from a tradesman recently. The photo of the invoice was 2479 x 3337 pixels andocrs
takes about 1.5s to process it on my Intel Mac. Downsizing to 30% of the original input size produces the same extracted output but runs nearly twice as fast (800-900ms).In some cases the input image really does need high resolution to make the text legible, so some mechanism to control this would be useful.