mozmeao / infra

Mozilla Marketing Engineering and Operations Infrastructure
https://mozilla.github.io/meao/
Mozilla Public License 2.0
59 stars 12 forks source link

Tokyo errors #355

Closed jgmize closed 5 years ago

jgmize commented 7 years ago

https://papertrailapp.com/groups/4220732/events?q=error https://synthetics.newrelic.com/accounts/1299394/monitors/81a0480c-5c13-4539-8063-a38472f3d2d2/results/4ddedfbe-a7b2-4def-82d9-7d0ba789a7dc?via=email&view=timeline

jgmize commented 7 years ago
Jul 19 18:12:19 ip-172-20-36-167 kubernetes.var.log.containers.bedrock-prod-web-4250502162-k8x03_: {"log":"BasketException: ('Connection aborted.', error(104, 'Connection reset by peer'))\n","stream":"stdout","time":"2017-07-20T01:12:18.496291567Z"}

Should only cause issues on pages that use basket; not the / url that had 2 504s in a row in from the NR synthetics monitor linked above

jgmize commented 7 years ago

No 5XX ELB errors in the last 3 hours on the bedrock-prod ELB according to https://ap-northeast-1.console.aws.amazon.com/ec2/v2/home?region=ap-northeast-1#LoadBalancers

jgmize commented 7 years ago

Text of NR error:

 www.mozilla.org Monitor 
 Tokyo, JP Location 
 07/20/2017 00:52:34 UTC Time 
 Error log: 

HTTPError: Server replied with a HTTP 504 response code

Text of NR alert closed notification:

 www.mozilla.org Monitor 
 Tokyo, JP Location 
 07/20/2017 00:54:30 UTC (1 minute 55 seconds downtime)
jgmize commented 7 years ago

The 1 minute 55 seconds of downtime above measures the time of the first 504 to the time the 3rd request from synthetics received the 301 response it expected on that URL.

jgmize commented 7 years ago

No 504 error codes were returned by bedrock itself in at least the past 3 days according to https://papertrailapp.com/groups/4220732/events?q=bedrock%20%22%20504%20%22&focus=824486437775958033

jgmize commented 7 years ago

https://www.cloudflare.com/a/analytics/mozilla.org/status_codes shows 20 504 errors from Osaka, Japan in the past hour. Looks like 2 of those 20 were seen by the NR synthetics monitor in a row, which triggered the alert.