untoldone / bloomapi

Create APIs out of public datasources
https://www.bloomapi.com/documentation/public-data
MIT License
89 stars 29 forks source link

bloomapi.com has a 502 error #68

Closed scontapay closed 9 years ago

scontapay commented 9 years ago

http://www.bloomapi.com/

8:42 AM Pacific Time

untoldone commented 9 years ago

Thanks for reporting this!

Fixed for now. Investigating the root cause. This didn't impact the API server which remained available through the website's downtime. A bug in the monitoring code also prevented early detection -- it was only monitoring the API server but is now monitoring both API + Web servers

untoldone commented 9 years ago

A more complete (though still incomplete) analysis showed that the webserver went down around 3:54AM PST -- it came back up 9:26AM PST after the web process was manually restarted. At the time, a number of different search engines were indexing the site as well as minimal normal traffic. Something (still unknown) caused the web process to crash without logging a reason. At that point 502 started to be returned by NGINX as the up upstream server was down. Upstart auto-restarted the web process, but something continued to cause the process to crash. The Upstart config was to eventually give up on restarting the process which is what led to the longer term downtime.

Investigating the syslog, server log, and upstart log did not give any additional information other than the process was being started after it exited. None of the first requests that returned 502 or the requests immediately before the crash resulted in a reproducible crash. I have added additional logging to the web process to try to track any unexpected exits. Hopefully we'll catch this faster next time with the new logging and be able to identify a root cause.

untoldone commented 9 years ago

For the record, or if anyone else has additional thoughts here's the relevant logs:

Snippet from /var/log/bloomweb-upstart.log

[Thu Jan  8 16:49:48 EST 2015] BloomWeb Starting
[Thu Feb 12 06:54:38 EST 2015] BloomWeb Starting
[Thu Feb 12 06:54:39 EST 2015] BloomWeb Starting
[Thu Feb 12 06:54:40 EST 2015] BloomWeb Starting
[Thu Feb 12 06:54:40 EST 2015] BloomWeb Starting
[Thu Feb 12 06:54:41 EST 2015] BloomWeb Starting
[Thu Feb 12 12:26:03 EST 2015] BloomWeb Starting

Snippet from /var/log/syslog

Feb 12 06:54:38 web-01 kernel: [7136918.929504] init: bloomweb main process ended, respawning
Feb 12 06:54:39 web-01 kernel: [7136919.938390] init: bloomweb main process ended, respawning
Feb 12 06:54:40 web-01 kernel: [7136920.597809] init: bloomweb main process ended, respawning
Feb 12 06:54:40 web-01 kernel: [7136921.246691] init: bloomweb main process ended, respawning
Feb 12 06:54:41 web-01 kernel: [7136921.844517] init: bloomweb main process ended, respawning
Feb 12 06:54:41 web-01 kernel: [7136922.455762] init: bloomweb respawning too fast, stopped

From /srv/bloomweb/logs/server.log

2015-02-12T11:54:38.996Z - info: starting bloomapi
2015-02-12T11:54:38.997Z - info: Master launching 1 workers
2015-02-12T11:54:39.357Z - info: Worker 1 listening on 3000
2015-02-12T11:54:39.669Z - info: starting bloomapi
2015-02-12T11:54:39.670Z - info: Master launching 1 workers
2015-02-12T11:54:40.010Z - info: Worker 1 listening on 3000
2015-02-12T11:54:40.351Z - info: starting bloomapi
2015-02-12T11:54:40.352Z - info: Master launching 1 workers
2015-02-12T11:54:40.666Z - info: Worker 1 listening on 3000
2015-02-12T11:54:40.957Z - info: starting bloomapi
2015-02-12T11:54:40.958Z - info: Master launching 1 workers
2015-02-12T11:54:41.264Z - info: Worker 1 listening on 3000
2015-02-12T11:54:41.555Z - info: starting bloomapi
2015-02-12T11:54:41.556Z - info: Master launching 1 workers
2015-02-12T11:54:41.876Z - info: Worker 1 listening on 3000
2015-02-12T17:26:03.902Z - info: starting bloomapi