tshrinivasan / OCR4wikisource

OCR for WikiSource using Google Drive OCR
GNU General Public License v2.0
33 stars 24 forks source link

Run do_ocr.py automatically when pages are not equal #66

Open ravidreams opened 8 years ago

ravidreams commented 8 years ago

Run do_ocr.py automatically when pages are not equal at the end of first do_ocr.py run. Right now, it waits for user input.

jayantanth commented 8 years ago

It will create endless loop, because we are using third party tool (Google drive) and ocr depends on scan page quality, so manual input necessary. so after first run completed next three/four time can be set re-run automatically. For next run should be done by user and there may be two option

  1. re-run
  2. skipped the undone page

skipped page as describe #38 can be added here for complete the full ocr process.

ravidreams commented 8 years ago

I see. How about limiting the iteration to 1 or 2 times only and then request manual input? This way, endless loop can be avoided.

But, this automatic feature is necessary if we are going to run batch of files together without the need for editing config.ini every time for new file. When the tool moves to the cloud, this might be necessary.

bodhisattwawiki commented 7 years ago

This is needed. Happening to me almost every time. At least run do_ocr.py second time automatically, if some pages are not OCRed. After that we can do it manually.