INFO:2020-06-03 21:53:59,585:pdf:117 Compress PDF...
INFO:2020-06-03 21:54:28,017:pdf:139 Compression by -0%.
INFO:2020-06-03 21:54:28,018:pdf:140 Final file size is 85.0MB
INFO:2020-06-03 21:54:32,580:discovery:280 URL being requested: GET https://www.googleapis.com/discovery/v1/apis/drive/v3/rest
INFO:2020-06-03 21:54:33,299:drive:37 Uploading amara_namachandrika_tiny_splits/amara_namachandrika_tiny_0001-0010.pdf
INFO:2020-06-03 21:54:33,308:discovery:911 URL being requested: POST https://www.googleapis.com/upload/drive/v3/files?alt=json&uploadType=resumable
INFO:2020-06-03 21:54:33,308:transport:157 Attempting refresh to obtain initial access_token
INFO:2020-06-03 21:54:33,313:client:777 Refreshing access_token
Traceback (most recent call last):
File "auto_ocr.py", line 8, in <module>
pdf.split_and_ocr_on_drive(pdf_file, key_file, small_pdf_pages=10)
File "/home/dhaval/.local/lib/python3.6/site-packages/doc_curation/pdf.py", line 61, in split_and_ocr_on_drive
drive_client.ocr_file(local_file_path=str(pdf_segment))
File "/usr/local/lib/python3.6/dist-packages/curation_utils/google/drive.py", line 69, in ocr_file
upload_result = self.upload(local_file_path=local_file_path)
File "/usr/local/lib/python3.6/dist-packages/curation_utils/google/drive.py", line 43, in upload
media_body=MediaFileUpload(local_file_path, mimetype=mime, resumable=True)
File "/usr/local/lib/python3.6/dist-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/googleapiclient/http.py", line 871, in execute
_, body = self.next_chunk(http=http, num_retries=num_retries)
File "/usr/local/lib/python3.6/dist-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/googleapiclient/http.py", line 1046, in next_chunk
self.resumable_uri, method="PUT", body=data, headers=headers
File "/usr/local/lib/python3.6/dist-packages/oauth2client/transport.py", line 175, in new_request
redirections, connection_type)
File "/usr/local/lib/python3.6/dist-packages/oauth2client/transport.py", line 282, in request
connection_type=connection_type)
File "/usr/local/lib/python3.6/dist-packages/httplib2/__init__.py", line 1991, in request
cachekey,
File "/usr/local/lib/python3.6/dist-packages/httplib2/__init__.py", line 1651, in _request
conn, request_uri, method, body, headers
File "/usr/local/lib/python3.6/dist-packages/httplib2/__init__.py", line 1589, in _conn_request
response = conn.getresponse()
File "/usr/lib/python3.6/http/client.py", line 1356, in getresponse
response.begin()
File "/usr/lib/python3.6/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.6/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.6/ssl.py", line 1012, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.6/ssl.py", line 874, in read
return self._sslobj.read(len, buffer)
File "/usr/lib/python3.6/ssl.py", line 631, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
I could OCR two works without any hassle. Thereafter, for each new work, I am getting this timeout error. Maybe something to do with the googleapi default timeout. If we can overwrite the default timeout, it may work. Do not know.
The problem persisted even when I reduced the pdf splits to 10 pages. They are 1-2 MB size only. So do not know the reason for timeout.
Increased the socket timeouts to 10 minutes.
For uploading some PDF and asking for OCR, google api default timeout of 1 minute may be too low.
So now it is increased to 10 minutes.
I could OCR two works without any hassle. Thereafter, for each new work, I am getting this timeout error. Maybe something to do with the googleapi default timeout. If we can overwrite the default timeout, it may work. Do not know. The problem persisted even when I reduced the pdf splits to 10 pages. They are 1-2 MB size only. So do not know the reason for timeout.