kr / s3

Go package for Amazon’s S3 API
http://godoc.org/github.com/kr/s3
MIT License
107 stars 34 forks source link

Premature EOF in some situations for s3cp? #21

Open msakrejda opened 10 years ago

msakrejda commented 10 years ago

We're running into some situations where we appear to get an EOF from a perfectly legitimate input file:

Put https://foo.s3.amazonaws.com/bar/baz?partNumber=1722&uploadId=<redacted>: EOF

We're running a somewhat older s3cp (2f24149626958aa1a61b6493cfed30643c2cf70d), but looking at the history, I don't think any of the commits since would address this. Could this be a s3cp problem, or can this error message occur due to upload issues? We've seen this on a number of instances, so it's unlikely to be a hardware issue in reading the source file.

msakrejda commented 10 years ago

For what it's worth, the errors seem to occur around about 22-25GB (of a 26.2GB file), although it's not regular enough for there to be any singlel threshold (and this doesn't seem deterministic; some large uploads did succeed recently).

kr commented 10 years ago

This will be hard to track down without a reproducible test case.

I suspect this error is coming from https://github.com/kr/s3/blob/2f2414/s3util/uploader.go#L186 via https://github.com/kr/s3/blob/2f2414/s3util/uploader.go#L121 .

If you can instrument the code and verify or falsify that it would help.

retryUploadPart tries at most twice for each part, (see https://github.com/kr/s3/blob/2f2414/s3util/uploader.go#L26), so this could be a increased levels of ordinary transient network problems manifesting as visible errors.

msakrejda commented 10 years ago

Thanks for the pointers. I figured this was a long shot. I'll see if I can dig in and get more info.

msakrejda commented 10 years ago

For what it's worth, I wasn't able to dig deeper, but uploading ~35GB of zeros seems to reliably reproduce it:

$ dd if=/dev/zero count=$((35*1024)) bs=1048576 | ./s3cp /dev/stdin https://$bucket.s3.amazonaws.com/$key
Put https://$bucket.s3.amazonaws.com/$key?partNumber=1342&uploadId=...: EOF

gof3r seems to have no trouble with this same file. According to its log output, it does seem to retry ~10 times over the course of the upload, but it never has to retry the same part. I've tried this twice so far.