kamui-fin / gazou

Japanese and Chinese OCR for Linux & Windows
GNU General Public License v3.0
87 stars 13 forks source link

Feature Request: Batch processing #3

Closed Atreyagaurav closed 3 years ago

Atreyagaurav commented 3 years ago

Is it possible to pass a sequence of images and then combine all the OCR result to a single text file?

Looks like this doesn't have a commandline option, so even if you just make a commandline option to pass an image file instead of screengrabbing, it would help as we can write a bash script to collect from the image.

I'm not that good with C++ myself otherwise I could try to use the functions you have created and then use those on the picture file provided from the command line.

Just gazou image-to-ocr.png or sth like that will also work if you don't want to spend too much time making a full fledged CLI.

That will reduce the time for overall preparation, and also removes the need to wait for the OCR to complete (on a side note an option to turn on notification that OCR is complete might be good when reading whole pages).

P.S. Thank you for this package, having it in AUR was a great help.

kamui-fin commented 3 years ago

I appreciate the feature request. Adding a command line mode would definitely be helpful for automation. It is now implemented and merged into master. The AUR package has been updated as well. Check out the readme for information on the new command line mode.

To get notifications, you can use notify-send from libnotify in the shell or a script.

Atreyagaurav commented 3 years ago

I got this error a lot when the image is already grayscale.

Error in pixConvertRGBToGray: pixs not 32 bpp
Error in pixScale: pixs not defined
Error in pixUnsharpMaskingGray: pixs not defined
Error in pixOtsuAdaptiveThreshold: pixs not defined or not 8 bpp
Error in pixSelectBySize: pixs not defined
Segmentation fault (core dumped)

The images I generated are from imagemagick import function, which seems to store a greyscale image if there aren't any other colors on the screenshot.

I thought as long as I can use convert from imagemagick to preprocess the images, I can still do batch processing and batch OCR, but even with a bit of searching around, I was unable to find a conversion from grayscale to RGB instead of the usual RGB to Grey. Plus it is indeed inefficient to convert already grayscale image to RGB and back.

Can I request you to make it skip the convert to Gray step if the image has a single band?

Atreyagaurav commented 3 years ago

I found the workaround for the grayscale part for now. convert p01.png -define png:color-type=2 p01-rgb.png

But More Importantly: The output has lost the newlines. It outputs a long single line for the whole page on terminal and GUI both.

I feel like I'm troubling you a lot. You can resolve the grayscale issue in your free time, but please resolve this newline issue before that.

Atreyagaurav commented 3 years ago

I have opened a pull request : https://github.com/kamui-7/Gazou-OCR/pull/4

I have tried to resolve these issues, merge them if it is fine.