Open jayantanth opened 8 years ago
Can you try manually uploading the missing pdf files and get text? Save the text in same name in the same folder.
Then run do_ocr.py to see there are no errors.
I have tried manually, but this is google drive error found in as above image.
in the same folder run the following commands.
touch page_00099.txt touch page_00099.upload
this will skip the file from do_ocr.py to check.
do the same for all missing files to create empty files.
Today I am running with 723 pages book, only two page stucked every time.
=========ERROR===========
INFO:main:Missing page_00099.txt INFO:main:page_00099.pdf should be reuploaded INFO:main:Missing page_00267.txt INFO:main:page_00267.pdf should be reuploaded INFO:main:
Text files are not equal to PDF files. Some PDF files not OCRed. Run this script again to complete OCR all the PDF files
Then I have tried to upload manual method. the error in google drive itself to convert text. page_00099.pdf
So now final issue is as of now....with out complete this job I could not run mediawiki_upload.py. Because there are no "text_for_page" file available. Every time its with sucked at page 99 and message will come "Text files are not equal to PDF files. Some PDF files not OCRed. Run this script again to complete OCR all the PDF files " I know that this not your script issue directly, this is google drive issue.