Frequent timeouts - Githubissues

rrrkren commented 7 years ago

Our CI system recently encounters very frequent timeouts when installing gems sourced from rails-assets.org, here is one error message example:

Gem::RemoteFetcher::UnknownHostError: timed out
(https://rails-assets.org/gems/rails-assets-iso-currency-0.2.7.gem)
An error occurred while installing rails-assets-iso-currency (0.2.7), and
Bundler cannot continue.
Make sure that `gem install rails-assets-iso-currency -v '0.2.7'` succeeds
before bundling.

The gem that gets timeout varies from time to time, and most gems from rails-assets.org would be installed, but roughly 60% of the time there will be a timeout and fails the CI. Is there anything that we might have been doing wrong, or any way around this that you could suggest? Thanks a lot!

hut8 commented 7 years ago

That's a DNS issue, not really a timeout connecting to us. Are you experiencing any other DNS issues? Can you re-run it with a more verbose output?

dinjas commented 7 years ago

I'm also having trouble with this on CI today. Site doesn't want to load in my browser here either.

nickgsc commented 7 years ago

I'm seeing frequent errors from Heroku as well. I'm also unable to load the page in a browser locally

-----> Installing dependencies using bundler 1.13.7
       Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment
       Fetching source index from https://rails-assets.org/
       Retrying fetcher due to error (2/4): Bundler::HTTPError Could not fetch specs from https://rails-assets.org/
       Retrying fetcher due to error (3/4): Bundler::HTTPError Could not fetch specs from https://rails-assets.org/
       Retrying fetcher due to error (4/4): Bundler::HTTPError Could not fetch specs from https://rails-assets.org/Could not fetch specs from https://rails-assets.org/
       Bundler Output: Fetching source index from https://rails-assets.org/

The DNS entry only resolves to one IP address for me here (using Google's public DNS)

Server:     8.8.8.8
Address:    8.8.8.8#53

Non-authoritative answer:
Name:   rails-assets.org
Address: 198.199.120.180

dvmonroe commented 7 years ago

@hut8

$ curl --silent --verbose --output dev/null rails-assets.org
* Rebuilt URL to: rails-assets.org/
*   Trying 198.199.120.180...
* TCP_NODELAY set
* Connected to rails-assets.org (198.199.120.180) port 80 (#0)
> GET / HTTP/1.1
> Host: rails-assets.org
> User-Agent: curl/7.51.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Curl_http_done: called premature == 1
* Closing connection 0

# from a bundle update locally
Fetching source index from https://rails-assets.org/

Retrying fetcher due to error (2/4): Bundler::HTTPError Could not fetch specs from https://rails-assets.org/
Retrying fetcher due to error (3/4): Bundler::HTTPError Could not fetch specs from https://rails-assets.org/
Retrying fetcher due to error (4/4): Bundler::HTTPError Could not fetch specs from https://rails-assets.org/

mparmer commented 7 years ago

we are also getting timeouts from rails-assets.org today, cannot fetch specs for our gems, and cannot connect to www side

hut8 commented 7 years ago

Yeah we have a big issue here. Trying to fix now.

hut8 commented 7 years ago

I've rebooted the primary instance. A little while ago, the CPU usage plummeted. It might be related to Digital Ocean's current storage issues in NYC1, but I'm not sure why the failover process wasn't invoked automatically.

hut8 commented 7 years ago

Ah, looks like the failover sort of worked fine:

This is the secondary:

This is the primary:

So it did actually fail over. Unfortunately both are virtually unreachable. This is probably due to the storage issues in NYC1. That issue has been in progress with DO for four hours, but has only affected our droplets for an hour and a half. This is pretty severe, but there isn't a whole lot we can do since both our primary and secondary are connected to storage volumes in NYC1. I'll keep you guys updated.

hut8 commented 7 years ago

The primary has been rebooting for 11 minutes. I don't think that opening a ticket with DO will help since they're obviously aware.

masonjm commented 7 years ago

@hut8 thanks for staying on top of it! You guys run a great service :).

hut8 commented 7 years ago

I'm able to get into the secondary server now. However, nginx is in disk sleep with all of its workers. I'm going to try to resolve this without power cycling, but it probably won't do much. Aside from that we'll likely need to wait for DO to restore its Volumes service.

hut8 commented 7 years ago

Might be fixed. I'm leaving it on the secondary now because the primary seems more unstable (although I have no way of diagnosing why other than "Something is weird with the Digital Ocean volumes").

hut8 commented 7 years ago

Alright looks unstable again. https://status.digitalocean.com/ is the status page for this issue.

waynerobinson commented 7 years ago

https://rails-assets.org is uncontactable by us in Australia. :-(

shineli1984 commented 7 years ago

Still experiencing the issue

Bundler::HTTPError Could not fetch specs from http://rails-assets.org/

this is 10 mins ago on our latest CI build

joshjordan commented 7 years ago

Hi @shineli1984 ! As @hut8 pointed out, https://status.digitalocean.com/ is the status page that you can check for this. Our hosting provider is still down.

hut8 commented 7 years ago

Everything looks good. Please open a new issue if other problems crop up. We'll have a post-mortem tomorrow internally and decide how to better scale the infrastructure, since at the time we deployed the failover node, NYC1 was the only region with block storage available. Now that there are more options, we will be moving it shortly to a different region. Thanks for your patience, everyone!

hut8 commented 7 years ago

Provisioning the new failover node in SFO2 (which makes this 100% redundant) is complete.

tenex / rails-assets

Frequent timeouts #397