jwilk-archive / ocrodjvu

OCR for DjVu
GNU General Public License v2.0
44 stars 19 forks source link

Fix & document exit codes #6

Closed jwilk closed 9 years ago

jwilk commented 11 years ago

Issue reported by GStager at Bitbucket:

Could you give more information about error handling and code returning?

jwilk commented 11 years ago

What information exactly do you need?

jwilk commented 11 years ago

Comment submitted by GStager at Bitbucket:

Return codes. Return codes in case of "--on-error=resume" parameter. Return codes in case of "warning" message from ocr engine.

I use ocrodjvu to batch processing, and I need differentiate critical and non-critical errors in ocr process.

jwilk commented 11 years ago

ocrodjvu doesn't pay attention to warnings generated by OCR engines.

--on-error=resume would normally cause exit code to be 0, even when processing one of the pages failed. (Although I think it's a bug, so it's subject to change.)

ocrodjvu also exits with 0 if it was interrupted by Ctrl+C. This is of course a bug; I'll get it fixed.

jwilk commented 11 years ago

Comment submitted by GStager at Bitbucket:

May be stop execution with non-zero exit code when some number pages failed (with --on-error=resume)?

jwilk commented 11 years ago

Comment submitted by GStager at Bitbucket:

And, is necessary return non-zero exit code with --on-error=resume parametr if execution crashes. So, processing of the file http://libgen.org/get?nametype=md5&md5=000745AD512BDC7ABF4F9977D6137411 crash:


- Page #417
*** page "419" not found
*** (djvused.cpp:371)
*** 'void verror(const char*, ...)'

Intermediate files were left in the '/tmp/ocrodjvu.hmnvxG' directory.
Traceback (most recent call last):
  File "/home/stager1/ocrodjvu-0.7.16/ocrodjvu", line 7, in <module>
    _.main(sys.argv)
  File "/home/stager1/ocrodjvu-0.7.16/lib/cli/ocrodjvu.py", line 542, in main
    context.process(options.path, options.pages)
  File "/home/stager1/ocrodjvu-0.7.16/lib/cli/ocrodjvu.py", line 524, in process
    self._process(*args, **kwargs)
  File "/home/stager1/ocrodjvu-0.7.16/lib/cli/ocrodjvu.py", line 514, in _process
    self._options.saver.save(document, pages_to_save, path, sed_file)
  File "/home/stager1/ocrodjvu-0.7.16/lib/cli/ocrodjvu.py", line 109, in save
    djvused.wait()
  File "/home/stager1/ocrodjvu-0.7.16/lib/ipc.py", line 114, in wait
    raise CalledProcessError(return_code, self.__command)
subprocess.CalledProcessError: Command 'djvused' returned non-zero exit status 1

, but return code is 0. Resulting djvu turn out is no ocr'd .

jwilk commented 11 years ago

Agreeed. I'll get that fixed.

jwilk commented 11 years ago

Comment submitted by GStager at Bitbucket:

This file: http://libgen.org/get?nametype=orig&md5=6a9f527b3226f56f1fbccb7addbcba57 has only


- Page #1
No image suitable for OCR.

error on every page. As result, file is not ocr'd, but result code is 0.

jwilk commented 9 years ago

The problem with Ctrl+C was fixed in 0.7.17.

In 6bd704f66eb6afcd17bf961e6a2cdb4325f262cb, I made ocrodjvu exit with code 2 if it resumed from an error. This will be part of ocrodjvu 0.8.

I haven't decided yet what to do with the case when there's “No image suitable for OCR” on every page.

I'd like to credit you in the changelog. Could you tell me what is your full name?

jwilk commented 9 years ago

I've just released ocrodjvu 0.8.

Feel free to open a new bug about the case when there's “No image suitable for OCR” on every page.

jwilk commented 8 years ago

Comment submitted by fazulakis at Bitbucket:

The "No image suitable for OCR" error in every page still occurs for me with ocrodjvu 0.9.1 and tesseract. However the images in my file are 600x600 dpi and very clear, in fact tesseract works perfectly on each individual image page.

Anyway, a small glitch, thanks a lot Jakub for a fantastic program.