Open Shreeshrii opened 6 years ago
With version 3 of script
Moving all temp files to OCR-Hanuman Chalisa.pdf-temp-2018-05-21-08-05-14
INFO:__main__:Running mv folder*.log currentfile.pdf doc_data.txt pg*.pdf page* txt* *.jpg "OCR-Hanuman Chalisa.pdf-temp-2018-05-21-08-05-14"
mv: cannot stat ‘page_00001.jpg’: No such file or directory
mv: cannot stat ‘page_00002.jpg’: No such file or directory
INFO:__main__:Merged all OCRed files to all_text_for_Hanuman Chalisa.pdf.txt
INFO:__main__:Making a copy of all text files to text-for-Hanuman Chalisa.pdf
INFO:__main__:Running cp *.txt text-for-Hanuman Chalisa.pdf
cp: target ‘Chalisa.pdf’ is not a directory
The output folders are not created. All files stay in the main directory.
Thanks
Will do few more test on this and add to the code.
Errors from another file - with v2 of script
mv: cannot stat 'page_01087.jpg': No such file or directory
mv: cannot stat 'page_01088.jpg': No such file or directory
mv: cannot stat 'page_01089.jpg': No such file or directory
INFO:__main__:Merged all OCRed files to all_text_for_Mudgala Purana (Pothi or Oblong).pdf.txt
INFO:__main__:Making a copy of all text files to text-for-Mudgala Purana (Pothi or Oblong).pdf
INFO:__main__:Running cp *.txt text-for-Mudgala Purana (Pothi or Oblong).pdf
sh: 1: Syntax error: "(" unexpected
INFO:__main__:
Done. Check the text files start with text_for_page_
Edit: Looks likesh: 1: Syntax error: "(" unexpected
has been reported previously also.
While testing do_ocr_jpg.py v2 I came across a problem related to spaces in the original file name.
I made the following changes to copy statement.
The file I tested with:
https://ia800107.us.archive.org/3/items/Hanuman_Chalisa/Hanuman%20Chalisa.pdf
It is a 2 page pdf in Devanagari script.