tshrinivasan / OCR4wikisource

OCR for WikiSource using Google Drive OCR
GNU General Public License v2.0
33 stars 24 forks source link

sh: 1: Syntax error: "(" unexpected #26

Closed jayantanth closed 8 years ago

jayantanth commented 8 years ago

jayanta@jayanta-Inspiron-3541:~/OCR$ python do_ocr.py INFO:main:Running do_ocr.py Version 1.34 INFO:root:Operating system = "Ubuntu 12.04.5 LTS"

INFO:main:URL = https://upload.wikimedia.org/wikipedia/commons/3/33/%E0%A6%97%E0%A7%80%E0%A6%A4%E0%A6%BF%E0%A6%97%E0%A7%81%E0%A6%9E%E0%A7%8D%E0%A6%9C_%28%E0%A7%A7%E0%A7%AF%E0%A7%A9%E0%A7%A7%29_-_%E0%A6%85%E0%A6%A4%E0%A7%81%E0%A6%B2%E0%A6%AA%E0%A7%8D%E0%A6%B0%E0%A6%B8%E0%A6%BE%E0%A6%A6_%E0%A6%B8%E0%A7%87%E0%A6%A8.djvu INFO:main:Columns = 1 INFO:main:Wiki Username = JoyBot INFO:main:Wiki Password = Not logging the password INFO:main:Wiki Source Language Code = bn INFO:main:Keep Temp folder in Google Drive = yes INFO:main:Original URL = https://upload.wikimedia.org/wikipedia/commons/3/33/গীতিগুঞ্জ_(১৯৩১)_-_অতুলপ্রসাদ_সেন.djvu INFO:main:File Name = গীতিগুঞ্জ(১৯৩১)-_অতুলপ্রসাদসেন.djvu INFO:main:File Type = djvu INFO:main:Created Temp folder OCR-গীতিগুঞ্জ(১৯৩১)_-_অতুলপ্রসাদ_সেন.djvu-temp-2016-01-09-20-38-29

Downloading the file গীতিগুঞ্জ(১৯৩১)-_অতুলপ্রসাদ_সেন.djvu

INFO:main:Downloading the file গীতিগুঞ্জ(১৯৩১)-_অতুলপ্রসাদসেন.djvu INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): upload.wikimedia.org /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning. SNIMissingWarning /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning. InsecurePlatformWarning [################################] 5311/5311 - 00:00:11 INFO:main:Download Completed INFO:main:Found a djvu file. Converting to PDF file.

sh: 1: Syntax error: "(" unexpected INFO:main:Running ddjvu --format=pdf গীতিগুঞ্জ(১৯৩১)-_অতুলপ্রসাদসেন.djvu গীতিগুঞ্জ(১৯৩১)_-_অতুলপ্রসাদ_সেন.pdf INFO:main:Aligining the Pages of PDF file.

INFO:main:Running mutool poster -x 1 গীতিগুঞ্জ(১৯৩১)-_অতুলপ্রসাদ_সেন.pdf currentfile.pdf sh: 1: Syntax error: "(" unexpected INFO:main:Spliting the PDF into single pages.

Error: Failed to open PDF file: currentfile.pdf Done. Input errors, so no output created. INFO:main:Running pdftk currentfile.pdf burst INFO:main: Creating a folder in Google Drive to upload files. Folder Name : OCR-গীতিগুঞ্জ(১৯৩১)-_অতুলপ্রসাদ_সেন.djvu-temp-2016-01-09-20-38-29

INFO:main:Running gdmkdir.py OCR-গীতিগুঞ্জ(১৯৩১)-_অতুলপ্রসাদ_সেন.djvu-temp-2016-01-09-20-38-29 | tee folder_in_google_drive.log sh: 1: Syntax error: "(" unexpected Traceback (most recent call last): File "do_ocr.py", line 215, in resultfile = open("folder_in_google_drive.log","r").readlines() IOError: [Errno 2] No such file or directory: 'folder_in_google_drive.log' jayanta@jayanta-Inspiron-3541:~/OCR$

do_ocr_2016-01-09-20-38-29_log.txt

jayantanth commented 8 years ago

Ok , May be I have found the main issue of this bug. The file name was content with "(" and ")". I have just changed the name (rename) with removing brackets with a new name. DJVU converter/Mutool may be not support "(" and ")" of file name or may google docs folder name not support "(" and ")".

Now it is working................

After complete full job, repost again.

jayantanth commented 8 years ago

After change the name ( without bracket) OCRed and uploaded to Wikisource successfully.

tshrinivasan commented 8 years ago

In wikisource pagenames, did you keep or remove the brackets?

jayantanth commented 8 years ago

Remove the brackets!

https://bn.wikisource.org/wiki/%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A7%8D%E0%A6%98%E0%A6%A3%E0%A7%8D%E0%A6%9F:%E0%A6%97%E0%A7%80%E0%A6%A4%E0%A6%BF%E0%A6%97%E0%A7%81%E0%A6%9E%E0%A7%8D%E0%A6%9C-%E0%A6%85%E0%A6%A4%E0%A7%81%E0%A6%B2%E0%A6%AA%E0%A7%8D%E0%A6%B0%E0%A6%B8%E0%A6%BE%E0%A6%A6_%E0%A6%B8%E0%A7%87%E0%A6%A8.djvu

tshrinivasan commented 8 years ago

How often brakets will come in file names?

Do you want me to handle in program or can you remove the brackets in file names manually?

jayantanth commented 8 years ago

Not frequently, But it is necessary at Disambiguation Pages . At the moment leave it. We can manage it at file name.

tshrinivasan commented 8 years ago

fixed this in version 1.37