rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.38k stars 12.72k forks source link

Tracking issue for spurious network failures on bots #40474

Open alexcrichton opened 7 years ago

alexcrichton commented 7 years ago

I'm hoping to catalog network failures on the bots here just to keep an eye on the rate that they're happening as well as common failure modes. If they're common enough we should investigate solutions.

github errors

Submodule clone timeout

https://travis-ci.org/rust-lang/rust/jobs/210363616

Cloning into '/home/travis/build/rust-lang/rust/src/llvm'...

No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.

Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

The build has been terminated

Fixed faiulres

Failure to fetch OpenSSL tarball

Fixed by https://github.com/rust-lang/rust/pull/40545

https://travis-ci.org/rust-lang/rust/jobs/211285568

curl: (7) Failed to connect to www.openssl.org port 443: Connection refused

command did not execute successfully: "curl" "-o" "/checkout/obj/build/mipsel-unknown-linux-gnu/openssl/openssl-1.0.2k.tar.tmp" "https://www.openssl.org/source/openssl-1.0.2k.tar.gz"
expected success, got: exit code: 7

CentOS 5 urls changed

https://travis-ci.org/rust-lang/rust/jobs/218179477

hopefully fixed by https://github.com/rust-lang/rust/pull/41045

CentOS 5.11 become EOL, apparently the base image failed to keep succeeding at yum update

Crates.io errors

Presumed bug in Cargo

ishitatsuyuki commented 7 years ago

Travis uses GCE and I suppose Google won't have problems with their high quality networking, so I guess this is some kind of anti-DDoS triggered. One way is to mirror, and another way is to force them into build cache.

Psst... using Google Cloud Storage will make the transfers completely inside Google's Cloud and potentially decreases: latency, errors, and egress bill.

alexcrichton commented 7 years ago

Looks like we've got new failures reaching crates.io during the cargotest phase, and PRs to update that code to implement retries would be greatly appreciated!

aidanhs commented 7 years ago

https://ci.appveyor.com/project/rust-lang/rust/build/1.0.2517/job/78c29lc858e7a0nh mentioned in the top comment is interesting because it's on appveyor...and we've just seen it again on travis - https://github.com/rust-lang/rust/pull/41395#issuecomment-295308379.

alexcrichton commented 7 years ago

@aidanhs I think the error on appveyor is different than the one you saw there on Travis. The one on AppVeyor (IIRC) was a legitimate bug in Cargo that needed fixing. The latter I've also seen on another PR and am having trouble reproducing locally.

Unfortunately I don't know the cause of this failure...

alexcrichton commented 7 years ago

(updated the top comment to reflect this fact)

Also note that I've been less vigilant about categorizing all failures in the above comment, many recently linked PRs likely are not classified under cases on the top comment

aidanhs commented 7 years ago

Huh yes, the "failed to parse object" on appveyor just seems to be a consistent feature...seems worrying.

Mark-Simulacrum commented 7 years ago

I got this from a non-bors build on Travis for #41684, unfortunately didn't get the raw log link...

Edit: @aidanhs was very kind and got me the link: https://travis-ci.org/rust-lang/rust/builds/228430396.

248.99s$ git fetch origin +refs/pull/41684/merge:
remote: Counting objects: 637872, done.
remote: Compressing objects: 100% (119898/119898), done.
error: RPC failed; curl 56 SSLRead() return error -36
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
The command "eval git fetch origin +refs/pull/41684/merge: " failed. Retrying, 2 of 3.
remote: Counting objects: 637872, done.
remote: Compressing objects: 100% (119898/119898), done.
error: RPC failed; curl 56 SSLRead() return error -36
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
The command "eval git fetch origin +refs/pull/41684/merge: " failed. Retrying, 3 of 3.
remote: Counting objects: 637872, done.
remote: Compressing objects: 100% (119898/119898), done.
error: RPC failed; curl 56 SSLRead() return error -36
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
The command "eval git fetch origin +refs/pull/41684/merge: " failed 3 times.
The command "git fetch origin +refs/pull/41684/merge:" failed and exited with 128 during .
Mark-Simulacrum commented 7 years ago

Potentially interesting to look at #34228 to help with git submodule related issues.

aidanhs commented 7 years ago

There are a few different locations that git is used and can fail that I'm aware of. I'll give a brief update on each of these and what their status is.

There's more discussion about some of this on #40772, but my dream solution is to cache all these git repos so updates on every build are really cheap. Unfortunately, caching naively doesn't work because they take a long time to extract and cause the builds to fail, so some thought is needed.

carols10cents commented 7 years ago

Not sure if this is what @aidanhs's travis-ci/travis-build#1020 will fix or not but wanted to track: https://travis-ci.org/rust-lang/rust/jobs/229744891 (about to retry so not sure if that link will remain valid):

$ git clone --depth=1 https://github.com/rust-lang/rust.git rust-lang/rust

Cloning into 'rust-lang/rust'...

remote: Counting objects: 10175, done.

remote: Compressing objects: 100% (8953/8953), done.

remote: Total 10175 (delta 5441), reused 2470 (delta 1126), pack-reused 0

Receiving objects: 100% (10175/10175), 8.09 MiB | 4.56 MiB/s, done.

Resolving deltas: 100% (5441/5441), done.

Checking out files: 100% (9696/9696), done.

$ cd rust-lang/rust

154.86s$ git fetch origin +refs/pull/41809/merge:

remote: Counting objects: 639664, done.

remote: Compressing objects: 100% (120194/120194), done.

error: RPC failed; curl 56 SSLRead() return error -36

fatal: The remote end hung up unexpectedly

fatal: early EOF

fatal: index-pack failed

The command "eval git fetch origin +refs/pull/41809/merge: " failed. Retrying, 2 of 3.

remote: Counting objects: 639664, done.

remote: Compressing objects: 100% (120194/120194), done.

error: RPC failed; curl 56 SSLRead() return error -36

fatal: The remote end hung up unexpectedly

fatal: early EOF

fatal: index-pack failed

The command "eval git fetch origin +refs/pull/41809/merge: " failed. Retrying, 3 of 3.

remote: Counting objects: 639664, done.

remote: Compressing objects: 100% (120194/120194), done.

error: RPC failed; curl 56 SSLRead() return error -36

fatal: The remote end hung up unexpectedly

fatal: early EOF

fatal: index-pack failed

The command "eval git fetch origin +refs/pull/41809/merge: " failed 3 times.

The command "git fetch origin +refs/pull/41809/merge:" failed and exited with 128 during .
carols10cents commented 7 years ago

Saw this same git fetch failure at https://travis-ci.org/rust-lang/rust/jobs/230354402:

$ cd rust-lang/rust

308.02s$ git fetch origin +refs/pull/41857/merge:

remote: Counting objects: 640316, done.

remote: Compressing objects: 100% (120304/120304), done.

error: RPC failed; curl 56 SSLRead() return error -36

fatal: The remote end hung up unexpectedly

fatal: early EOF

fatal: index-pack failed

The command "eval git fetch origin +refs/pull/41857/merge: " failed. Retrying, 2 of 3.

remote: Counting objects: 640316, done.

remote: Compressing objects: 100% (120304/120304), done.

error: RPC failed; curl 56 SSLRead() return error -36

fatal: The remote end hung up unexpectedly

fatal: early EOF

fatal: index-pack failed

The command "eval git fetch origin +refs/pull/41857/merge: " failed. Retrying, 3 of 3.

remote: Counting objects: 640316, done.

remote: Compressing objects: 100% (120304/120304), done.

error: RPC failed; curl 56 SSLRead() return error -36

fatal: The remote end hung up unexpectedly

fatal: early EOF

fatal: index-pack failed

The command "eval git fetch origin +refs/pull/41857/merge: " failed 3 times.

The command "git fetch origin +refs/pull/41857/merge:" failed and exited with 128 during .

Your build has been stopped.

/Users/travis/.travis/job_stages: line 151: shell_session_update: command not found
aidanhs commented 7 years ago

Rebased and pinged on the PR I made to help with the issue above with cloning.

Also, might be worth splitting this issue into at least two (perhaps git-related and non-git-related)?

bjorn3 commented 7 years ago

There is a spell error in the issue description: Fixed faiulres.

durka commented 6 years ago

(fixed by #46715)

Got problems downloading crosstool-ng. https://travis-ci.org/rust-lang/rust/jobs/315943014

[00:01:55] Step 7/21 : RUN sh /scripts/crosstool-ng.sh
[00:01:55]  ---> Running in 3355bcff4861
[00:01:56] 
+ url=http://crosstool-ng.org/download/crosstool-ng/crosstool-ng-1.22.0.tar.bz2
[00:01:56] 
+ curl+  -f http://crosstool-ng.org/download/crosstool-ng/crosstool-ng-1.22.0.tar.bz2
[00:01:56] 
tar xjf -
[00:01:56] 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
[00:01:56]                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:02:06 --:--:--     0
[00:04:02] curl: (56) Recv failure: Connection reset
 by peer
[00:04:02] 
[00:04:02] bzip2: Compressed file ends unexpectedly;
[00:04:02]  perhaps it is corrupted?  *Possible* reason follows.
[00:04:02] bzip2: Inappropriate ioctl for device
[00:04:02]  Input file = (stdin), output file = (stdout)
[00:04:02] 
[00:04:02] It is possible that the compressed file(s) have become corrupted.
[00:04:02] You can use the -tvv option to test integrity of such files.
[00:04:02] 
[00:04:02] You can use the `bzip2recover' program to attempt to recover
[00:04:02] data from undamaged sections of corrupted files.
[00:04:02] 
[00:04:02] tar: Child returned status 2
[00:04:02] tar: Error is not recoverable: exiting now
pietroalbini commented 6 years ago

All the merges on the auto branch are failing to download things from the network since an hour ago. Every build does nothing for 30 minutes and then is cancelled.

Failure time PR Logs
2018-06-07T15:12:09Z https://github.com/rust-lang/rust/issues/51283 https://api.travis-ci.org/v3/job/389279440/log.txt
2018-06-07T14:36:29Z https://github.com/rust-lang/rust/pull/51407 https://api.travis-ci.org/v3/job/389263317/log.txt
2018-06-07T14:02:11Z https://github.com/rust-lang/rust/pull/51399 https://api.travis-ci.org/v3/job/389217017/log.txt