tshrinivasan / OCR4wikisource

OCR for WikiSource using Google Drive OCR
GNU General Public License v2.0
33 stars 24 forks source link

Put temporary OCR files in a folder #105

Open Shreeshrii opened 6 years ago

Shreeshrii commented 6 years ago

When running do_ocr_jpg.py, the OCRed files are kept in the main directory as well as a couple of folders.

~/OCR4wikisource$ ls *.txt
all_text_for_2015.253393.Hanuman-Chalisa.pdf.txt       text_for_page_00004.txt  text_for_page_00011.txt  text_for_page_00018.txt  text_for_page_00025.txt  text_for_page_00032.txt
all_text_for_Hanuman Chalisa.pdf.txt                   text_for_page_00005.txt  text_for_page_00012.txt  text_for_page_00019.txt  text_for_page_00026.txt  text_for_page_00033.txt
all_text_for_Mudgala Purana (Pothi or Oblong).pdf.txt  text_for_page_00006.txt  text_for_page_00013.txt  text_for_page_00020.txt  text_for_page_00027.txt  text_for_page_00034.txt
missing_files.txt                                      text_for_page_00007.txt  text_for_page_00014.txt  text_for_page_00021.txt  text_for_page_00028.txt
text_for_page_00001.txt                                text_for_page_00008.txt  text_for_page_00015.txt  text_for_page_00022.txt  text_for_page_00029.txt
text_for_page_00002.txt                                text_for_page_00009.txt  text_for_page_00016.txt  text_for_page_00023.txt  text_for_page_00030.txt
text_for_page_00003.txt                                text_for_page_00010.txt  text_for_page_00017.txt  text_for_page_00024.txt  text_for_page_00031.txt

Suggest that instead of keeping in the root /OCR4wikisource folder, these should be kept in a subfolder under it.

Shreeshrii commented 6 years ago

The mv command needed some change. The following works:

command = "mv folder*.log currentfile.pdf  doc_data.txt pg*.pdf page* txt* text*  " + '"' +  temp_folder + '"'