Closed jayantanth closed 8 years ago
This is caused by the issues with do_ocr.py not running.
Closing this now.
Reopen, if you still get any issue on mediawiki_uploader.py
The same error found.
mediawiki_uploader_2016-01-05-12-19-54_log.txt do_OCR.py successful.
Got the same error while running mediawiki_uploader.py
Log attached.
Facing same issue in or wiki too.
Just wonderingn if this is a problem because of the indian numerals in page number urls?
Just confirming that tested both in bn and or. It works now.
mediawiki_uploader_2016-01-05-08-19-35_log.txt mediawiki_uploader.py
Error Log
jayanta@jayanta-Inspiron-3541:~$ cd OCR jayanta@jayanta-Inspiron-3541:~/OCR$ python mediawiki_uploader.py INFO:main:Running mediawiki_uploader.py Version 1.31 INFO:main:URL = https://upload.wikimedia.org/wikisource/bn/2/2f/Testocrbengali.pdf INFO:main:Columns = 1 INFO:main:Wiki Username = jayantanth INFO:main:Wiki Password = Not logging the password INFO:main:Wiki Source Language Code = bn INFO:main:File Name = Testocrbengali.pdf INFO:main:File Type = pdf INFO:main:Original URL = https://upload.wikimedia.org/wikisource/bn/2/2f/Testocrbengali.pdf INFO:main:Wiki URL = https://bn.wikisource.org/w/api.php INFO:root:Login Status = True INFO:root:
Logged in to https://bn.wikisource.org Traceback (most recent call last): File "mediawiki_uploader.py", line 170, in
pagename = filename + "/" + str(convert_to_indic(wikisource_language_code, pageno))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
mediawiki_uploader_2016-01-05-00-09-30_log.txt