Open msakrejda opened 10 years ago
For what it's worth, the errors seem to occur around about 22-25GB (of a 26.2GB file), although it's not regular enough for there to be any singlel threshold (and this doesn't seem deterministic; some large uploads did succeed recently).
This will be hard to track down without a reproducible test case.
I suspect this error is coming from https://github.com/kr/s3/blob/2f2414/s3util/uploader.go#L186 via https://github.com/kr/s3/blob/2f2414/s3util/uploader.go#L121 .
If you can instrument the code and verify or falsify that it would help.
retryUploadPart tries at most twice for each part, (see https://github.com/kr/s3/blob/2f2414/s3util/uploader.go#L26), so this could be a increased levels of ordinary transient network problems manifesting as visible errors.
Thanks for the pointers. I figured this was a long shot. I'll see if I can dig in and get more info.
For what it's worth, I wasn't able to dig deeper, but uploading ~35GB of zeros seems to reliably reproduce it:
$ dd if=/dev/zero count=$((35*1024)) bs=1048576 | ./s3cp /dev/stdin https://$bucket.s3.amazonaws.com/$key
Put https://$bucket.s3.amazonaws.com/$key?partNumber=1342&uploadId=...: EOF
gof3r seems to have no trouble with this same file. According to its log output, it does seem to retry ~10 times over the course of the upload, but it never has to retry the same part. I've tried this twice so far.
We're running into some situations where we appear to get an EOF from a perfectly legitimate input file:
We're running a somewhat older s3cp (2f24149626958aa1a61b6493cfed30643c2cf70d), but looking at the history, I don't think any of the commits since would address this. Could this be a s3cp problem, or can this error message occur due to upload issues? We've seen this on a number of instances, so it's unlikely to be a hardware issue in reading the source file.