INFO:main:URL = https://bn.wikisource.org/wiki/image:OCR-test-1.djvu
INFO:main:Columns = 1
INFO:main:Wiki Username = nasirkhan
INFO:main:Wiki Password = Not logging the password
INFO:main:Wiki Source Language Code = bn
INFO:main:Keep Temp folder in Google Drive = yes
INFO:main:Original URL = https://bn.wikisource.org/wiki/image:OCR-test-1.djvu
INFO:main:File Name = image:OCR-test-1.djvu
INFO:main:File Type = djvu
INFO:main:Created Temp folder OCR-image:OCR-test-1.djvu-temp-2016-02-13-17-58-41
Downloading the file image:OCR-test-1.djvu
INFO:main:Downloading the file image:OCR-test-1.djvu
INFO:urllib3.connectionpool:Starting new HTTPS connection (1): bn.wikisource.org
[################################] 11/11 - 00:00:00
INFO:main:Download Completed
INFO:main:Found a djvu file. Converting to PDF file.
ddjvu: [1-15114] IFFByteStream not ready for reading chunk.
ddjvu: [1-15114] IFFByteStream not ready for reading chunk.
ddjvu: Cannot decode document.
INFO:main:Running ddjvu --format=pdf "image:OCR-test-1.djvu" "image:OCR-test-1".pdf
INFO:main:Aligining the Pages of PDF file.
INFO:main:Running mutool poster -x 1 "image:OCR-test-1.pdf" currentfile.pdf
error: cannot open image:OCR-test-1.pdf
error: cannot load document 'image:OCR-test-1.pdf'
uncaught exception: cannot load document 'image:OCR-test-1.pdf'
INFO:main:Spliting the PDF into single pages.
Error: Unable to find file.
Error: Failed to open PDF file:
currentfile.pdf
Done. Input errors, so no output created.
INFO:main:Running pdftk currentfile.pdf burst
INFO:main:Joining the PDF files ...
INFO:main:
Creating a folder in Google Drive to upload files. Folder Name : OCR-image:OCR-test-1.djvu-temp-2016-02-13-17-58-41
INFO:main:Running mv folder_.log currentfile.pdf docdata.txt pg.pdf page* txt* "OCR-image:OCR-test-1.djvu-temp-2016-02-13-17-58-41"
mv: cannot stat ‘currentfile.pdf’: No such file or directory
mv: cannot stat ‘docdata.txt’: No such file or directory
mv: cannot stat ‘pg.pdf’: No such file or directory
mv: cannot stat ‘page_’: No such file or directory
mv: cannot stat ‘txt’: No such file or directory
INFO:main:Merged all OCRed files to all_text_for_image:OCR-test-1.djvu.txt
INFO:main:Making a copy of all text files to text-for-image:OCR-test-1.djvu
INFO:main:Running cp .txt text-for-image:OCR-test-1.djvu
INFO:main:
Done. Check the text files start with text_forpage
INFO:main:
The PDF files and result text files are equval. Now running the mediawiki_uploader.py script
INFO:main:Running do_ocr.py 1.50
INFO:root:Operating system = "Ubuntu 14.04.3 LTS"
Logged in to https://bn.wikisource.org
INFO:root:Checking for bot access rights
INFO:root:The user nasirkhan does not have bot access
INFO:root:
Done. Uploaded all text files to wiki source
mv: cannot stat ‘upload-*’: No such file or directory
nasirkhan@nasir:~/wiki/OCR4wikisource$
installed 1.50 with bash ./setup.sh
nasirkhan@nasir:~/wiki/OCR4wikisource$ python do_ocr.py INFO:main:Running do_ocr.py 1.50 INFO:root:Operating System = "Ubuntu 14.04.3 LTS"
INFO:main:URL = https://bn.wikisource.org/wiki/image:OCR-test-1.djvu INFO:main:Columns = 1 INFO:main:Wiki Username = nasirkhan INFO:main:Wiki Password = Not logging the password INFO:main:Wiki Source Language Code = bn INFO:main:Keep Temp folder in Google Drive = yes INFO:main:Original URL = https://bn.wikisource.org/wiki/image:OCR-test-1.djvu INFO:main:File Name = image:OCR-test-1.djvu INFO:main:File Type = djvu INFO:main:Created Temp folder OCR-image:OCR-test-1.djvu-temp-2016-02-13-17-58-41
Downloading the file image:OCR-test-1.djvu
INFO:main:Downloading the file image:OCR-test-1.djvu INFO:urllib3.connectionpool:Starting new HTTPS connection (1): bn.wikisource.org [################################] 11/11 - 00:00:00 INFO:main:Download Completed INFO:main:Found a djvu file. Converting to PDF file.
ddjvu: [1-15114] IFFByteStream not ready for reading chunk. ddjvu: [1-15114] IFFByteStream not ready for reading chunk. ddjvu: Cannot decode document. INFO:main:Running ddjvu --format=pdf "image:OCR-test-1.djvu" "image:OCR-test-1".pdf INFO:main:Aligining the Pages of PDF file.
INFO:main:Running mutool poster -x 1 "image:OCR-test-1.pdf" currentfile.pdf error: cannot open image:OCR-test-1.pdf error: cannot load document 'image:OCR-test-1.pdf' uncaught exception: cannot load document 'image:OCR-test-1.pdf' INFO:main:Spliting the PDF into single pages.
Error: Unable to find file. Error: Failed to open PDF file: currentfile.pdf Done. Input errors, so no output created. INFO:main:Running pdftk currentfile.pdf burst INFO:main:Joining the PDF files ...
INFO:main: Creating a folder in Google Drive to upload files. Folder Name : OCR-image:OCR-test-1.djvu-temp-2016-02-13-17-58-41
INFO:main:Running gdmkdir.py "OCR-image:OCR-test-1.djvu-temp-2016-02-13-17-58-41" | tee folder_in_google_drive.log id: 0Bzu8oam42f2mY3hONEV3RzFyTGc drive view: https://drive.google.com/drive/folders/0Bzu8oam42f2mY3hONEV3RzFyTGc folder view: https://docs.google.com/folderview?id=0Bzu8oam42f2mY3hONEV3RzFyTGc&usp=drivesdk INFO:main:Split the text files to sync with the original images INFO:main:Joining text files based on Column No INFO:main: Moving all temp files to OCR-image:OCR-test-1.djvu-temp-2016-02-13-17-58-41
INFO:main:Running mv folder_.log currentfile.pdf docdata.txt pg.pdf page* txt* "OCR-image:OCR-test-1.djvu-temp-2016-02-13-17-58-41" mv: cannot stat ‘currentfile.pdf’: No such file or directory mv: cannot stat ‘docdata.txt’: No such file or directory mv: cannot stat ‘pg.pdf’: No such file or directory mv: cannot stat ‘page_’: No such file or directory mv: cannot stat ‘txt’: No such file or directory INFO:main:Merged all OCRed files to all_text_for_image:OCR-test-1.djvu.txt INFO:main:Making a copy of all text files to text-for-image:OCR-test-1.djvu INFO:main:Running cp .txt text-for-image:OCR-test-1.djvu INFO:main:
Done. Check the text files start with text_forpage INFO:main:
The PDF files and result text files are equval. Now running the mediawiki_uploader.py script
INFO:main:Running do_ocr.py 1.50 INFO:root:Operating system = "Ubuntu 14.04.3 LTS"
INFO:main:URL = https://bn.wikisource.org/wiki/image:OCR-test-1.djvu INFO:main:Columns = 1 INFO:main:Wiki Username = nasirkhan INFO:main:Wiki Password = Not logging the password INFO:main:Wiki Source Language Code = bn INFO:main:Edit Summary = testing... INFO:main:File Name = image:OCR-test-1.djvu INFO:main:File Type = djvu INFO:main:Original URL = https://bn.wikisource.org/wiki/image:OCR-test-1.djvu INFO:main:Wiki URL = https://bn.wikisource.org/w/api.php INFO:root:Login Status = True INFO:root:
Logged in to https://bn.wikisource.org INFO:root:Checking for bot access rights INFO:root:The user nasirkhan does not have bot access INFO:root: Done. Uploaded all text files to wiki source
mv: cannot stat ‘upload-*’: No such file or directory nasirkhan@nasir:~/wiki/OCR4wikisource$
http://paste.ubuntu.com/15035323/