yarnpkg / yarn

The 1.x line is frozen - features and bugfixes now happen on https://github.com/yarnpkg/berry
https://classic.yarnpkg.com
Other
41.38k stars 2.72k forks source link

502 bad gateway #2769

Closed tjwebb closed 7 years ago

tjwebb commented 7 years ago

Do you want to request a feature or report a bug? BUG

What is the current behavior? Installation on Heroku fails with:

       error An unexpected error occurred: "https://registry.yarnpkg.com/cryptiles/-/cryptiles-2.0.5.tgz: Request failed "502 Bad Gateway"".

If the current behavior is a bug, please provide the steps to reproduce. Behavior is intermittent. A subsequent build with identical parameters worked.

What is the expected behavior? It should not error

Please mention your node.js, yarn and operating system version. Heroku (cedar 14)

       Downloading and installing node 7.6.0...
       Using default npm version: 4.1.2
       Resolving yarn version (latest) via semver.io...
       Downloading and installing yarn (0.18.2)...
       Installed yarn 0.18.2
jonaskello commented 7 years ago

I'm getting something similar. Intermittent 503 Service Unavailable. Sometimes it works, sometimes it does not:

[1/4] Resolving packages...
[2/4] Fetching packages...
info If you think this is a bug, please open a bug report with the information provided in "/builds/systemair/diaq-app/yarn-error.log".
error An unexpected error occurred: "https://registry.yarnpkg.com/@types/mocha/-/mocha-2.2.33.tgz: Request failed \"503 Service Unavailable\"".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

EDIT:

If I keep refreshing https://registry.yarnpkg.com/ in chrome I sometimes (like every 10th time) get this:

Error 503 All backends failed or unhealthy

All backends failed or unhealthy

Guru Mediation:

Details: cache-bma7035-BMA 1487951750 907364723

Varnish cache server
chrisatomix commented 7 years ago

Also experiencing this issue, but we're using Gitlab:

Installing Yarn Dependencies
yarn install v0.19.1
[1/4] Resolving packages...
[2/4] Fetching packages...
info If you think this is a bug, please open a bug report with the information provided in "(redacted)/yarn-error.log".
error An unexpected error occurred: "https://registry.yarnpkg.com/minimatch/-/minimatch-3.0.3.tgz: Request failed \"502 Bad Gateway\"".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
Yarn Dependency Install Failed
ERROR: Build failed: exit code 1
Daniel15 commented 7 years ago

This is a Cloudflare issue as it's coming from their CDN edge servers (particularly the Varnish error 503). @thejameskyle, @kittens - Do we have a contact at Cloudflare that could help with this?

jamiebuilds commented 7 years ago

There doesn't seem to be a significant issue:

screen shot 2017-02-26 at 9 06 51 pm

In the last month these were the top errors (by when they peaked):

I'm gonna assume this was a temporary problem and close

jonaskello commented 7 years ago

FWIW I'm still seeing a lot of intermittent 503 today. Our CI servers have a lot more than 20 builds that has failed with this error. I'm thinking maybe it is a DNS issue.

Anyway, I found this excellent blog post about using offline-mirror in yarn. We are switching our projects to this workflow. I guess this workflow would be good for anyone experiencing network problems when fetching packages.

jamiebuilds commented 7 years ago

@jonaskello I'm afraid that generally when errors are not spiking on our end, It's most likely something on yours that we cannot fix.

Daniel15 commented 7 years ago

@thejameskyle I don't think that graph shows the full story. I feel like I've encountered the 503 error more than 10 times myself, and I don't even use Yarn that frequently. Whenever it happens, using npm works fine. @jonaskello also mentioned that he has more than 20 builds failed with that error, so CloudFlare's stats seem incorrect.

I might set up some basic monitoring (eg. try to download a small package every minute) and track how many failures occur.

jamiebuilds commented 7 years ago

Unless this issue becomes wide spread I'm gonna keep saying it's probably something wrong at your end. If we had a spike in errors there would be people pouring into our issue tracker.

jonaskello commented 7 years ago

I would tend to agree that if it is only affecting a few clients it is something local. What strikes me as odd though is that the request gets a response from the server. If there were network issues that impeded the request, I would not expect the server to send a response like 503. From what I understand, getting this response implies that the http connection is fully functional all the way to the registry.yarnpkg.com server. I'm thinking maybe we sometimes hit a server that is not working properly. The latter could be related to the way the DNS for registry.yarnpkg.com is resolved. I can see that from my end it has 5 records:

$ dig registry.yarnpkg.com
...
;; ANSWER SECTION:
registry.yarnpkg.com.   253 IN  A   104.16.63.173
registry.yarnpkg.com.   253 IN  A   104.16.59.173
registry.yarnpkg.com.   253 IN  A   104.16.61.173
registry.yarnpkg.com.   253 IN  A   104.16.60.173
registry.yarnpkg.com.   253 IN  A   104.16.62.173

If i just ping registry.yarnpkg.com it seems to always resolve to 104.16.59.173. I'm just guessing here of course. Since yarn does not provide error information about which IP it resolved to when it gets 503 there is no easy way to know. I guess I could set something up with tcpdump but since offline-mirror is working nice now this is not much of a problem for us.

dknecht commented 7 years ago

A 503 error wouldn't be created by Cloudflare. That is an error that would be generated by origin backend.

Daniel15 commented 7 years ago

Another thought I had: What if this is occuring on npm's end, but is exacerbated by the fact that there are so many requests coming from CloudFlare IP ranges? If their load balancers route based on a hash of the origin IP address, Yarn requests might be disproportionately routed compared to other requests to npm and thus experience a different error rate. Maybe we need to loop in someone from npm to clarify how they route requests to their origin servers.

Seeing as several Yarn users have mentioned that this only occurs when using Yarn and not using npm, there's definitely something going wrong somewhere, I'm just not entirely sure where.

jamiebuilds commented 7 years ago

@Daniel15 When I last spoke to npm, they looked up the error rate for yarn requests and it was actually a lower error rate than npm's. It was a small dataset (last 1M requests), but it doesn't seem like there's a huge problem there.

I would hold off trying to diagnose a problem until we can actually see evidence of that problem, because so far every metric we've seen looks fine

jamiebuilds commented 7 years ago

I'm afraid we can't really use anecdotal evidence of this. It's not really meaningful to have every person who's had an error come to us, we're at a scale already that we're always going to have errors, and lots of times those are going to be things we can't control.

So until we can find evidence that there is something broken that we can find fix, we're not going to be spending our time usefully.

TL;DR: Measure the problem before you try to fix it. Don't report anecdotal evidence.

I'm going to start deleting comments of people reporting their own errors, just hit the 👍 on the issue.

Daniel15 commented 7 years ago

until we can actually see evidence of that problem

If people are seeing "502 Bad Gateway", I'm not sure what extra evidence we need. I mean it's always possible that a corporate proxy server is throwing that error, but it's likely that most users don't go through a proxy and are receiving that error either from CloudFlare, from Fastly, or from npm's origin servers. Either way, it's not a client-side issue.

@thejameskyle, what additional evidence would be beneficial for this issue? Would it be helpful to capture the full request and response headers whenever a 5xx error occurs? Perhaps we could add a verbose debugging mode to Yarn that does that.

I'm going to reopen this issue given people are still reporting it, so that we don't get any new duplicate issues.

jamiebuilds commented 7 years ago

Like I said, there's always going to be a number of random errors that we cannot control. There's no point in chasing all of them if people just run them again and it fixes them.

We have three separate reports which show no evidence of a problem. If you don't trust any of those reports then open up an issue for adding reporting to the Yarn client.

Until then, I'm just going to keep closing every issue opened like this, because it's not useful.

mattkime commented 7 years ago

does the yarn team get any feedback on how often the 502 errors occur? Is there a method for collecting this info?

jamiebuilds commented 7 years ago

Yes, we have Cloudflare, npm, and Fastly. All three show low error rates.

mattkime commented 7 years ago

something doesn't add up. i see these problems every day.

jamiebuilds commented 7 years ago

@mattkime Then open up a PR to add error reporting to the Yarn client which proves the problem *and provides us with data we can look into.

jamiebuilds commented 7 years ago

Moving to https://github.com/yarnpkg/yarn/issues/2848

mattkime commented 7 years ago

I wonder if this is more likely while running yarn install from inside a docker container and whether yarn retries when it gets a 502 error.

At any rate, its possible that a certain segment of users sees the problem quite frequently while others pretty much never.

Daniel15 commented 7 years ago

I also filed #2849 to double-check our retry logic and ensure it's working properly. Yarn is supposed to retry when downloads / API requests fail, but we might not be retrying correctly. We could verify this by intentionally injecting 5xx reponses (eg. by using a debugging proxy such as Fiddler) and seeing how Yarn handles it.

jamiebuilds commented 7 years ago

I'm going to lock this issue now, because the solution is to do #2848.