sanskrit-coders / doc_curation

MIT License
7 stars 4 forks source link

Errors - arguments missing for string, missing clear_bac_chars_in_file #18

Open drdhaval2785 opened 3 years ago

drdhaval2785 commented 3 years ago

Error thrown are as follow. The output is OK.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.6/logging/__init__.py", line 994, in emit
    msg = self.format(record)
  File "/usr/lib/python3.6/logging/__init__.py", line 840, in format
    return fmt.format(record)
  File "/usr/lib/python3.6/logging/__init__.py", line 577, in format
    record.message = record.getMessage()
  File "/usr/lib/python3.6/logging/__init__.py", line 338, in getMessage
    msg = msg % self.args
TypeError: not enough arguments for format string
Call stack:
  File "auto_ocr.py", line 8, in <module>
    drive_ocr.split_and_ocr_on_drive(pdf_file, key_file, small_pdf_pages=25)
  File "/home/dhaval/.local/lib/python3.6/site-packages/doc_curation/pdf/drive_ocr.py", line 59, in split_and_ocr_on_drive
    end_page=end_page)
  File "/home/dhaval/.local/lib/python3.6/site-packages/doc_curation/pdf/__init__.py", line 28, in split_into_small_pdfs
    logging.info("Splitting %s into segments of %d", pdf_path)
Message: 'Splitting %s into segments of %d'
Arguments: ('ekavarnarthasangraha2.pdf',)
WARNING:2021-03-03 12:23:25,754:__init__:49 ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf exists
INFO:2021-03-03 12:23:25,755:drive_ocr:62 Do the OCR
INFO:2021-03-03 12:23:25,758:discovery:280 URL being requested: GET https://www.googleapis.com/discovery/v1/apis/drive/v3/rest
INFO:2021-03-03 12:23:26,445:drive:42 Uploading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf
INFO:2021-03-03 12:23:26,453:discovery:911 URL being requested: POST https://www.googleapis.com/upload/drive/v3/files?alt=json&uploadType=resumable
INFO:2021-03-03 12:23:26,453:transport:157 Attempting refresh to obtain initial access_token
INFO:2021-03-03 12:23:26,458:client:777 Refreshing access_token
INFO:2021-03-03 12:23:42,351:drive:54 Downloading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf.txt
INFO:2021-03-03 12:23:42,363:discovery:911 URL being requested: GET https://www.googleapis.com/drive/v3/files/1Id2TGCu-hovO1vdE1g0N5BPFJVfrP8H_uSlH292evko/export?mimeType=text%2Fplain&alt=media
INFO:2021-03-03 12:23:43,265:drive:62 Done downloading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf.txt
INFO:2021-03-03 12:23:43,265:drive:65 Deleting 1Id2TGCu-hovO1vdE1g0N5BPFJVfrP8H_uSlH292evko
INFO:2021-03-03 12:23:43,274:discovery:911 URL being requested: DELETE https://www.googleapis.com/drive/v3/files/1Id2TGCu-hovO1vdE1g0N5BPFJVfrP8H_uSlH292evko?
Traceback (most recent call last):
  File "auto_ocr.py", line 8, in <module>
    drive_ocr.split_and_ocr_on_drive(pdf_file, key_file, small_pdf_pages=25)
  File "/home/dhaval/.local/lib/python3.6/site-packages/doc_curation/pdf/drive_ocr.py", line 73, in split_and_ocr_on_drive
    file_helper.clear_bad_chars_in_file(file_path=final_ocr_path)
AttributeError: module 'curation_utils.file_helper' has no attribute 'clear_bad_chars_in_file'
vvasuki commented 3 years ago

What is your version of curation_utils?

(The problem can be solved by you running sudo pip install git+https://github.com/sanskrit-coders/curation_utils/@master -U - just asking to see if I must update pypi.)

drdhaval2785 commented 3 years ago

my version of curation_utils was 0.0.8. After updating the curation_utils as shown above - it became 0.1.5

The second error is gone, but the first one still persists. See the log

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.6/logging/__init__.py", line 994, in emit
    msg = self.format(record)
  File "/usr/lib/python3.6/logging/__init__.py", line 840, in format
    return fmt.format(record)
  File "/usr/lib/python3.6/logging/__init__.py", line 577, in format
    record.message = record.getMessage()
  File "/usr/lib/python3.6/logging/__init__.py", line 338, in getMessage
    msg = msg % self.args
TypeError: not enough arguments for format string
Call stack:
  File "auto_ocr.py", line 8, in <module>
    drive_ocr.split_and_ocr_on_drive(pdf_file, key_file, small_pdf_pages=25)
  File "/home/dhaval/.local/lib/python3.6/site-packages/doc_curation/pdf/drive_ocr.py", line 59, in split_and_ocr_on_drive
    end_page=end_page)
  File "/home/dhaval/.local/lib/python3.6/site-packages/doc_curation/pdf/__init__.py", line 28, in split_into_small_pdfs
    logging.info("Splitting %s into segments of %d", pdf_path)
Message: 'Splitting %s into segments of %d'
Arguments: ('ekavarnarthasangraha2.pdf',)
INFO:2021-03-04 09:59:04,460:drive_ocr:62 Do the OCR
INFO:2021-03-04 09:59:04,466:discovery:280 URL being requested: GET https://www.googleapis.com/discovery/v1/apis/drive/v3/rest
INFO:2021-03-04 09:59:05,021:drive:74 OCRing ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf to ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf.txt
INFO:2021-03-04 09:59:05,021:drive:42 Uploading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf
INFO:2021-03-04 09:59:05,030:discovery:911 URL being requested: POST https://www.googleapis.com/upload/drive/v3/files?alt=json&uploadType=resumable
INFO:2021-03-04 09:59:05,031:transport:157 Attempting refresh to obtain initial access_token
INFO:2021-03-04 09:59:05,035:client:777 Refreshing access_token
INFO:2021-03-04 09:59:18,521:drive:54 Downloading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf.txt
INFO:2021-03-04 09:59:18,533:discovery:911 URL being requested: GET https://www.googleapis.com/drive/v3/files/1zUnyCUpdvpjIMnJkV61L5vUTjsLJ91WK9_iDmtXXNY8/export?mimeType=text%2Fplain&alt=media
INFO:2021-03-04 09:59:19,406:drive:62 Done downloading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf.txt
INFO:2021-03-04 09:59:19,407:drive:65 Deleting 1zUnyCUpdvpjIMnJkV61L5vUTjsLJ91WK9_iDmtXXNY8
INFO:2021-03-04 09:59:19,418:discovery:911 URL being requested: DELETE https://www.googleapis.com/drive/v3/files/1zUnyCUpdvpjIMnJkV61L5vUTjsLJ91WK9_iDmtXXNY8?
vvasuki commented 3 years ago

Can you confirm that your doc_curation package installation matches github?