mapbox / mason

Cross platform package manager for C/C++ apps
BSD 2-Clause "Simplified" License
254 stars 69 forks source link

Network errors impacting mason downloads #516

Open springmeyer opened 6 years ago

springmeyer commented 6 years ago

I feel like I've been seen an increased amount of network failures when fetching binaries from s3 in the last month +. This ticket stands to track these to start assembling a fuller picture of the failures and see if there is a pattern.

springmeyer commented 6 years ago

Failed to download https://mason-binaries.s3.amazonaws.com/osx-x86_64/android-ndk/arm-9-r13b.tar.gz (returncode: 56) on OS X travis build: https://travis-ci.org/mapbox/mason/jobs/304807664#L1357

springmeyer commented 6 years ago

/cc @mapsam who mentioned seeing multiple/repeated clang++ download failures. @mapsam was this on OS X or within docker?

mapsam commented 6 years ago

@springmeyer I was on OSX and saw hangs with clang++ when using the following curl command:

curl -sSfL https://s3.amazonaws.com/mason-binaries/osx-x86_64/clang++/5.0.0.tar.gz | tar --gunzip --extract --strip-components=1

The connection to the file is made relatively quick, but the 30MB download takes much longer than other 30MB files.

springmeyer commented 6 years ago

@mapsam, okay thanks for the details. After https://github.com/mapbox/mason/commit/1727795f314dbef66fb0f84ee98a82a62e77b5d1 mason will now output the exact returncode on error. This is what is producing the:

(returncode: 56)

Above in the error I saw in @artemp's commit where the android SDK failed to download. Let's keep an eye on whether we always see 56 (CURLE_RECV_ERROR) or whether we see other errors reported by curl.

springmeyer commented 6 years ago

Not an s3 issue, but noting nonetheless that I also just hit this on an OS X travis job:

$ ./mason build ${MASON_NAME} ${MASON_VERSION}
Cloning into '/Users/travis/build/mapbox/mason/mason_packages/.build/mapnik-vf02a25901'...
error: RPC failed; curl 56 SSLRead() return error -36
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

Which looks like a git clone failing in curl, also with 56 as error.

springmeyer commented 6 years ago

Now seeing:

* Downloading binary package https://mason-binaries.s3.amazonaws.com/linux-x86_64/sqlite/3.8.8.1.tar.gz
Failed to download https://mason-binaries.s3.amazonaws.com/linux-x86_64/sqlite/3.8.8.1.tar.gz (returncode: 35)

https://travis-ci.org/mapbox/mason/jobs/305567441#L495

springmeyer commented 6 years ago

ugh, also just hit:

oci runtime error: exec failed: container_linux.go:265: starting container process caused "could not create session key: disk quota exceeded"

https://travis-ci.org/mapbox/mason/jobs/305567199

springmeyer commented 6 years ago

hrm:

./scripts/clang-format.sh
Downloading https://s3.amazonaws.com/mason-binaries/linux-x86_64/clang++/5.0.0.tar.gz
curl: (22) The requested URL returned error: 429 Too Many Requests
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
make: *** [format] Error 2

https://travis-ci.org/mapbox/node-cpp-skel/jobs/305550318#L552

springmeyer commented 6 years ago

/cc @rclark who I've spoken with about this a few weeks ago. @rclark - s3 downloads from the mason bucket appear to be degrading and the problem is worsening. Any ideas of things to test or try to get to the bottom of why this is happening?

rclark commented 6 years ago

Do you have any way to observe the S3 connections or S3 errors more directly? All the error codes you've got here appear to be from downstream applications that are perhaps reacting to S3 networking failures. But even the 429 isn't an S3 response code -- they give you a 503 if they want you to SlowDown.

springmeyer commented 6 years ago

We are using curl on the command line to download the binary .tar.gz files from s3: https://github.com/mapbox/mason/blob/2602c302fd17d70fcef3f2fe90482d0e6232fdb8/mason.sh#L533-L544.

In https://github.com/mapbox/mason/commit/1727795f314dbef66fb0f84ee98a82a62e77b5d1 I modified things to actually try to print the http error code.

But even the 429 isn't an S3 response code -- they give you a 503 if they want you to SlowDown.

That one (The requested URL returned error: 429 Too Many Requests) struck me as well - that looks to be coming from the curl code itself rather than the bash output logic I added.

rclark commented 6 years ago

I think I'd have to take it to AWS support. You might try to check for x-amz headers in the HTTP response to see if S3 is trying to tell you anything there.

springmeyer commented 6 years ago

Thanks @rclark - signing off for the holiday now. I will add -v to dump the headers next time I see persistent errors.

springmeyer commented 6 years ago

another one, which looks only related to travis network since the upstream is not coming from AWS. I probably won't post more of this kind to avoid being too noisy on this ticket, but will post this one since I've not seen it before:

* Downloading http://nongnu.askapache.com/freetype/freetype-2.5.5.tar.bz2...
curl: (56) Recv failure: Connection reset by peer
Failed to download http://nongnu.askapache.com/freetype/freetype-2.5.5.tar.bz2 (returncode: 56)

https://travis-ci.org/mapbox/mason/jobs/308033127#L1784

springmeyer commented 6 years ago
CMake Error at cmake/mason.cmake:103 (message):
  [Mason] Failed to download
  https://mason-binaries.s3.amazonaws.com/headers/rapidjson/1.1.0.tar.gz:
  curl: (35) gnutls_handshake() failed: Error in the pull function.

https://circleci.com/gh/mapbox/mapbox-gl-native/88893?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

sssoleileraaa commented 6 years ago

Error message in Travis when trying to download recently published LLVM 6.0.0 binaries:

Failed to download https://mason-binaries.s3.amazonaws.com/linux-x86_64/android-ndk/arm-14-r16b.tar.gz (returncode: 141)

Note: (returncode: 141)