Open drdhaval2785 opened 3 years ago
What is your version of curation_utils
?
(The problem can be solved by you running sudo pip install git+https://github.com/sanskrit-coders/curation_utils/@master -U
- just asking to see if I must update pypi.)
my version of curation_utils was 0.0.8. After updating the curation_utils as shown above - it became 0.1.5
The second error is gone, but the first one still persists. See the log
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.6/logging/__init__.py", line 994, in emit
msg = self.format(record)
File "/usr/lib/python3.6/logging/__init__.py", line 840, in format
return fmt.format(record)
File "/usr/lib/python3.6/logging/__init__.py", line 577, in format
record.message = record.getMessage()
File "/usr/lib/python3.6/logging/__init__.py", line 338, in getMessage
msg = msg % self.args
TypeError: not enough arguments for format string
Call stack:
File "auto_ocr.py", line 8, in <module>
drive_ocr.split_and_ocr_on_drive(pdf_file, key_file, small_pdf_pages=25)
File "/home/dhaval/.local/lib/python3.6/site-packages/doc_curation/pdf/drive_ocr.py", line 59, in split_and_ocr_on_drive
end_page=end_page)
File "/home/dhaval/.local/lib/python3.6/site-packages/doc_curation/pdf/__init__.py", line 28, in split_into_small_pdfs
logging.info("Splitting %s into segments of %d", pdf_path)
Message: 'Splitting %s into segments of %d'
Arguments: ('ekavarnarthasangraha2.pdf',)
INFO:2021-03-04 09:59:04,460:drive_ocr:62 Do the OCR
INFO:2021-03-04 09:59:04,466:discovery:280 URL being requested: GET https://www.googleapis.com/discovery/v1/apis/drive/v3/rest
INFO:2021-03-04 09:59:05,021:drive:74 OCRing ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf to ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf.txt
INFO:2021-03-04 09:59:05,021:drive:42 Uploading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf
INFO:2021-03-04 09:59:05,030:discovery:911 URL being requested: POST https://www.googleapis.com/upload/drive/v3/files?alt=json&uploadType=resumable
INFO:2021-03-04 09:59:05,031:transport:157 Attempting refresh to obtain initial access_token
INFO:2021-03-04 09:59:05,035:client:777 Refreshing access_token
INFO:2021-03-04 09:59:18,521:drive:54 Downloading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf.txt
INFO:2021-03-04 09:59:18,533:discovery:911 URL being requested: GET https://www.googleapis.com/drive/v3/files/1zUnyCUpdvpjIMnJkV61L5vUTjsLJ91WK9_iDmtXXNY8/export?mimeType=text%2Fplain&alt=media
INFO:2021-03-04 09:59:19,406:drive:62 Done downloading ekavarnarthasangraha2_splits/ekavarnarthasangraha2_0001-0012.pdf.txt
INFO:2021-03-04 09:59:19,407:drive:65 Deleting 1zUnyCUpdvpjIMnJkV61L5vUTjsLJ91WK9_iDmtXXNY8
INFO:2021-03-04 09:59:19,418:discovery:911 URL being requested: DELETE https://www.googleapis.com/drive/v3/files/1zUnyCUpdvpjIMnJkV61L5vUTjsLJ91WK9_iDmtXXNY8?
Can you confirm that your doc_curation package installation matches github?
Error thrown are as follow. The output is OK.