tshrinivasan / OCR4wikisource

OCR for WikiSource using Google Drive OCR
GNU General Public License v2.0
33 stars 24 forks source link

for missing pages #32

Closed tha-uzhavan closed 8 years ago

tha-uzhavan commented 8 years ago

Today i found that few pages are missing after the command 'do_ocr.py'. I found those missing txt-file's numbers manually among 789 txt-files. Instead of that, kindly make necessary code change to show the missing file's numbers. After that, i placed dummy files in the same folder with that missing file numbers. The content of the dummy file is [[Category: OCR to be done]]. Then i run the second command 'mediawiki_uploader.py' . It finished all the rest of the work well. So, the terminal must not give the message "Re do the OCR". Most of us facing Internet connectivity. If possible, re do the missing pages only or add the dummy category pages as i said above. Srini! call me as usual.

tshrinivasan commented 8 years ago

What is the version you are running?

If you find any issues like some files missing to be OCR, just run the same do_ocr.py again in the same folder.

Nothing to do any manual change.

It will find missing files and do OCR for them only.

Try with latest version always.

tha-uzhavan commented 8 years ago

yes, i learned 'git pull' command from Ravi.