notify when hubs go offline

heatseeknyc / relay

web app for setting up sensors, receiving and storing their data, and viewing it

0 stars 2 forks source link

notify when hubs go offline #20

Closed williamcodes closed 8 years ago

williamcodes commented 8 years ago

As heat seek staff, When a hub goes offline, Then we would like some kind of notification So that we can call the tenant or replace their hub

Notes: Looping @essejhsif in on this since he wants to tackle it. Here's an example of a hub that went down that we wish we'd known about back when it happened.

http://relay.heatseeknyc.com/cells/0013a20040c17fab

hrldcpr commented 8 years ago

Possibly this should be on the main app (i.e. http://heatseeknyc.com, what do we call that app?) so that it's a full end-to-end notification. I.e. if it stops receiving readings for any given cell, it should send an alert. This would catch not only hubs dying but also relay server dying and any other possible issues.

williamcodes commented 8 years ago

I've been referring to them as the hubs, the relay, and the app. Perhaps we could be more specific, since the relay server kind of has a web app on it too. Rails app?

I think we should have monitoring both places. You're right, full end-to-end would be great. The relay app is a little more mission critical though and we have limited human bandwidth, plus the rails app already has some basic error monitoring.

hrldcpr commented 8 years ago

Well, monitoring in the main app would implicitly detect issues in the relay app in addition to any issues further along in the pipeline, and as far as human bandwidth you could implement it since you're familiar with the codebase :open_mouth: Seems like it might be easier to add and more powerful.

Of course redundant monitoring on both servers would eventually be nice, in case one of the monitoring services dies, and to pinpoint where exactly things are failing.

williamcodes commented 8 years ago

You're right, monitoring the rails app would catch issues further down the pipeline. It would also generate a lot of noise though because there way more open issues on the rails app than the relay app. By human bandwidth, I'm referring to the fact that Jesse volunteered to do this one. He's been meaning to implement monitoring for a while and is more comfortable in Python than in Ruby. That probably wasn't clear in the original message, I'll assign it to him. I know you're busy interpeting voltages for the v0.5 cells, and have a mountain of other issues as well, and I'm busy prepping and installing sensors, and there are even more pressing issues on the Rails app for me to address, especially relating to our WUnderground dependency.

hrldcpr commented 8 years ago

Oh cool yeah if Jesse wants to do it, perfecto!

williamcodes commented 8 years ago

@essejhsif is discovering the mindjob that is CoreOS. Apparently there's no package manager and definitely no python on it. It may turn out to be easier to monitor the app server. We shall see!

williamcodes commented 8 years ago

@essejhsif has discovered that it's way easier to monitor the app server. We have seen! CoreOS is its own animal, and the app server already has SendGrid which we can use for the alerts. Closing this issue for now.