tshrinivasan / OCR4wikisource

OCR for WikiSource using Google Drive OCR
GNU General Public License v2.0
33 stars 24 forks source link

Mediawiki_uploader.py not running if there is a missing page #79

Closed ravidreams closed 8 years ago

ravidreams commented 8 years ago

Mediawiki_uploader.py not running if there is a missing page.

I tried creating this page manually and also tried following commands:

touch page_00001.txt touch page_00001.upload

This is a recurring problem for many files. Google won't OCR these pages and gets stuck when we try running do_ocr.py again.

Logged in to https://ta.wikisource.org INFO:root:Checking for bot access rights INFO:root:The user Ravidreamsbot has bot access. INFO:root: Done. Uploaded all text files to wiki source

mv: cannot stat ‘all_textfor’: No such file or directory mv: cannot stat ‘OCR_’: No such file or directory mv: cannot stat ‘upload-*’: No such file or directory mv: cannot stat ‘செந்தமிழ்ப்_பெட்டகம்-2.pdf’: No such file or directory

tha-uzhavan commented 8 years ago

I think the issue arose because of the Internet connectivity. When i rerun the do_ocr.py. All page convertion are well and uploaded the text in ta.wikisource.

tshrinivasan commented 8 years ago

When google can not ocr few text files, run the following command.

python create_dummy_files.py

This will create dummy text files for the incomplete pdf files.

Then, run again python do_ocr.py

to complete all the pending works.