Open Shreeshrii opened 6 years ago
When running do_ocr_jpg.py, the OCRed files are kept in the main directory as well as a couple of folders.
~/OCR4wikisource$ ls *.txt all_text_for_2015.253393.Hanuman-Chalisa.pdf.txt text_for_page_00004.txt text_for_page_00011.txt text_for_page_00018.txt text_for_page_00025.txt text_for_page_00032.txt all_text_for_Hanuman Chalisa.pdf.txt text_for_page_00005.txt text_for_page_00012.txt text_for_page_00019.txt text_for_page_00026.txt text_for_page_00033.txt all_text_for_Mudgala Purana (Pothi or Oblong).pdf.txt text_for_page_00006.txt text_for_page_00013.txt text_for_page_00020.txt text_for_page_00027.txt text_for_page_00034.txt missing_files.txt text_for_page_00007.txt text_for_page_00014.txt text_for_page_00021.txt text_for_page_00028.txt text_for_page_00001.txt text_for_page_00008.txt text_for_page_00015.txt text_for_page_00022.txt text_for_page_00029.txt text_for_page_00002.txt text_for_page_00009.txt text_for_page_00016.txt text_for_page_00023.txt text_for_page_00030.txt text_for_page_00003.txt text_for_page_00010.txt text_for_page_00017.txt text_for_page_00024.txt text_for_page_00031.txt
Suggest that instead of keeping in the root /OCR4wikisource folder, these should be kept in a subfolder under it.
The mv command needed some change. The following works:
command = "mv folder*.log currentfile.pdf doc_data.txt pg*.pdf page* txt* text* " + '"' + temp_folder + '"'
When running do_ocr_jpg.py, the OCRed files are kept in the main directory as well as a couple of folders.
Suggest that instead of keeping in the root /OCR4wikisource folder, these should be kept in a subfolder under it.