Closed maccradar closed 6 years ago
self._http = urllib3.PoolManager(
timeout=urllib3.Timeout.DEFAULT_TIMEOUT,
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where(),
retries=urllib3.Retry(
total=5,
backoff_factor=0.2,
status_forcelist=[500, 502, 503, 504]
)
)
And from urllib3's ConnectionPool line 725
if not conn:
# Try again
log.warning("Retrying (%r) after connection "
"broken by '%r': %s", retries, err, url)
return self.urlopen(method, url, body, headers, retries,
redirect, assert_same_host,
timeout=timeout, pool_timeout=pool_timeout,
release_conn=release_conn, body_pos=body_pos,
**response_kw)
It seems the timeout causes the error, but the retry fails (hangs).
It could be caused by urllib3. Can you post more information on the process that is causing this issue? Is there a pattern to this? Is it happening only with big files? What version of Python are you using? Can you post more information about the s3 server you are connecting to? We will try to reproduce the issue locally.
Sorry for the late reply. We are running an upload service for our lab that checks for new fairly large datasets (up to 20GBs) and uploads them to a minio server running in a Docker Swarm at our data center. It seems to happen with smaller datasets (a few 100MBs) as well. The uploader uses Python 3.5 and it seems the Minio Python driver we installed with pip3 is not the latest version I find here as arguments for the MinioClient include only a timeout and not the full HttpClient object. Maybe upgrading the driver already would help?
You can let us know how it behaves with the latest version ?
Upgraded with pip3 install --upgrade minio
and still running python 3.5. Now the python process silently dies after trying to send a 29GB file. First it almost blocks the server by consuming 90+ % of resources and after a while the python process, which was started via crontab dies without any notification...
I assume there is no 4GB file size limit on Minio and its Docker image I am not aware of?
Server side log from the time of the Broken Pipe:
bme-fiber-minio.2.9fk65oiglp6q@node4 | time="2017-12-06T11:30:02Z" level=error msg="Unable to create object part." cause="read
tcp 176.20.0.22:9000->176.20.0.31:48786: i/o timeout" source="[object-handlers.go:883:objectAPIHandlers.PutObjectPartHandler()]"
stack="fs-v1-helpers.go:303:fsCreateFile fs-v1-multipart.go:543:fsObjects.PutObjectPart <autogenerated>:1:(*fsObjects).PutObject
Part object-handlers.go:881:objectAPIHandlers.PutObjectPartHandler api-router.go:48:(objectAPIHandlers).PutObjectPartHandler-fm"
My bad, the 29GB transfer (attempt) causes our client's swap to be exhausted...:
cat /var/log/kern.log | grep python3
Dec 15 17:01:41 lbtf-mediator kernel: [4397030.116840] [ 1232] 1000 1232 2824531 1832646 7537 21 919631 0 python3
Dec 15 17:01:41 lbtf-mediator kernel: [4397030.116863] Out of memory: Kill process 1232 (python3) score 678 or sacrifice child
Dec 15 17:01:41 lbtf-mediator kernel: [4397030.116874] Killed process 1232 (python3) total-vm:11298124kB, anon-rss:7330584kB, file-rss:0kB, shmem-rss:0kB
Dec 15 17:01:41 lbtf-mediator kernel: [4397030.649052] oom_reaper: reaped process 1232 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
After increasing the swap, the process doesn't die but can't finish the 29GB upload. In my log files I see thousands of WARNING Connection pool is full, discarding connection: <minio hostname>
Could this be due to too many worker thread trying to upload the parts?
After almost 7 hours the upload completed, leaving around 60GB of orphaned data in /.minio.sys/tmp
. The next upload in line caused quite a few BrokenPipeErrors but succeeded in 8 mins for 4GB. There is quite a discrepancy between the 29GB transfer taking 7 hours and the 4GB transfer taking only 8 mins...
This is not reproducing. So closing this as working as intended. Please reactivate if you think this still occuring.
Maybe related to https://github.com/requests/requests/issues/2422, Sometimes, but quite frequently, we receive following response after a Minio.fput_object call:
It then attempts to open a new HTTP connection but hangs and the only way to proceed is to kill the python process and restart it, after which it usually uploads fine.
Have any of you experienced this and is there a way to capture this error better so we can retry explicitly?
Here is the code snippet which causes this issue:
The warning message I get has to come from within Minio or the underlying urllib3, because our logger does not register successful or failed upload, the process hangs somewhere in the
fput_object
call.