Open springmeyer opened 6 years ago
Failed to download https://mason-binaries.s3.amazonaws.com/osx-x86_64/android-ndk/arm-9-r13b.tar.gz (returncode: 56)
on OS X travis build: https://travis-ci.org/mapbox/mason/jobs/304807664#L1357
/cc @mapsam who mentioned seeing multiple/repeated clang++ download failures. @mapsam was this on OS X or within docker?
@springmeyer I was on OSX and saw hangs with clang++ when using the following curl command:
curl -sSfL https://s3.amazonaws.com/mason-binaries/osx-x86_64/clang++/5.0.0.tar.gz | tar --gunzip --extract --strip-components=1
The connection to the file is made relatively quick, but the 30MB download takes much longer than other 30MB files.
@mapsam, okay thanks for the details. After https://github.com/mapbox/mason/commit/1727795f314dbef66fb0f84ee98a82a62e77b5d1 mason will now output the exact returncode on error. This is what is producing the:
(returncode: 56)
Above in the error I saw in @artemp's commit where the android SDK failed to download. Let's keep an eye on whether we always see 56
(CURLE_RECV_ERROR) or whether we see other errors reported by curl.
Not an s3 issue, but noting nonetheless that I also just hit this on an OS X travis job:
$ ./mason build ${MASON_NAME} ${MASON_VERSION}
Cloning into '/Users/travis/build/mapbox/mason/mason_packages/.build/mapnik-vf02a25901'...
error: RPC failed; curl 56 SSLRead() return error -36
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
Which looks like a git clone failing in curl, also with 56
as error.
Now seeing:
* Downloading binary package https://mason-binaries.s3.amazonaws.com/linux-x86_64/sqlite/3.8.8.1.tar.gz
Failed to download https://mason-binaries.s3.amazonaws.com/linux-x86_64/sqlite/3.8.8.1.tar.gz (returncode: 35)
ugh, also just hit:
oci runtime error: exec failed: container_linux.go:265: starting container process caused "could not create session key: disk quota exceeded"
hrm:
./scripts/clang-format.sh
Downloading https://s3.amazonaws.com/mason-binaries/linux-x86_64/clang++/5.0.0.tar.gz
curl: (22) The requested URL returned error: 429 Too Many Requests
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
make: *** [format] Error 2
https://travis-ci.org/mapbox/node-cpp-skel/jobs/305550318#L552
/cc @rclark who I've spoken with about this a few weeks ago. @rclark - s3 downloads from the mason bucket appear to be degrading and the problem is worsening. Any ideas of things to test or try to get to the bottom of why this is happening?
Do you have any way to observe the S3 connections or S3 errors more directly? All the error codes you've got here appear to be from downstream applications that are perhaps reacting to S3 networking failures. But even the 429 isn't an S3 response code -- they give you a 503 if they want you to SlowDown
.
We are using curl on the command line to download the binary .tar.gz
files from s3: https://github.com/mapbox/mason/blob/2602c302fd17d70fcef3f2fe90482d0e6232fdb8/mason.sh#L533-L544.
In https://github.com/mapbox/mason/commit/1727795f314dbef66fb0f84ee98a82a62e77b5d1 I modified things to actually try to print the http error code.
But even the 429 isn't an S3 response code -- they give you a 503 if they want you to SlowDown.
That one (The requested URL returned error: 429 Too Many Requests
) struck me as well - that looks to be coming from the curl code itself rather than the bash output logic I added.
I think I'd have to take it to AWS support. You might try to check for x-amz
headers in the HTTP response to see if S3 is trying to tell you anything there.
Thanks @rclark - signing off for the holiday now. I will add -v
to dump the headers next time I see persistent errors.
another one, which looks only related to travis network since the upstream is not coming from AWS. I probably won't post more of this kind to avoid being too noisy on this ticket, but will post this one since I've not seen it before:
* Downloading http://nongnu.askapache.com/freetype/freetype-2.5.5.tar.bz2...
curl: (56) Recv failure: Connection reset by peer
Failed to download http://nongnu.askapache.com/freetype/freetype-2.5.5.tar.bz2 (returncode: 56)
CMake Error at cmake/mason.cmake:103 (message):
[Mason] Failed to download
https://mason-binaries.s3.amazonaws.com/headers/rapidjson/1.1.0.tar.gz:
curl: (35) gnutls_handshake() failed: Error in the pull function.
Error message in Travis when trying to download recently published LLVM 6.0.0 binaries:
Failed to download https://mason-binaries.s3.amazonaws.com/linux-x86_64/android-ndk/arm-14-r16b.tar.gz (returncode: 141)
Note: (returncode: 141)
I feel like I've been seen an increased amount of network failures when fetching binaries from s3 in the last month +. This ticket stands to track these to start assembling a fuller picture of the failures and see if there is a pattern.