heatseeknyc / relay

web app for setting up sensors, receiving and storing their data, and viewing it
0 stars 2 forks source link

relay server stopped relaying readings to main app again #13

Closed williamcodes closed 8 years ago

williamcodes commented 8 years ago

This appears to have affected all cells, I checked 6 and they all had unrelayed readings. I restarted the relay server and it started transmitting again. The first reading came in at 2015-12-18 13:54:11 +0000. It looks like we're back up to current. What could have caused it?

It can't be a duplicates issue since that's been fixed already and because the app logs aren't full of requests causing 400s. In fact the app logs shows the last reading being posted at 2015-12-17 23:20:45 +0000. It was a 404 for sensor 26D0. Looks like there are a good number of 404s actually. Perhaps they're causing the same problem that the 400s were causing?

hrldcpr commented 8 years ago

Yeah weird, it seems to be dying and printing out its name, e.g.:

Dec 17 23:20:46 setup.heatseeknyc.com docker[9341]: batch

The 404s and 400s should be fine, so it seems more like a weird docker or coreos or memory issue, perhaps related to why the other units are dying, I'll try to investigate soon.

williamcodes commented 8 years ago

Issue #16 was a duplicate of this. It appears that hub 2506 wasn't the only one that the app server wasn't receiving readings for. Restarting the relay server has restarted transmission for now. That trick seems to only keep us up for about a day though.

williamcodes commented 8 years ago

Restarting the relay server is no longer causing all the readings to transmit. Here are some examples of hubs with unrelayed readings:

http://relay.heatseeknyc.com/hubs/0013a20040dabe04 http://relay.heatseeknyc.com/hubs/0013a20040dc2744 http://relay.heatseeknyc.com/hubs/0013a20040dc272d

How can we fix this?

hrldcpr commented 8 years ago
$ journalctl -ru batch
-- Logs begin at Sat 2015-12-12 02:52:27 UTC, end at Tue 2016-01-05 22:12:02 UTC. --
Jan 05 22:12:01 setup.heatseeknyc.com docker[988]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Jan 05 22:12:01 setup.heatseeknyc.com docker[988]: INFO:root:POSTing {'time': 1451702301.0, 'sensor_name': '0013a20040d847e3', 'verification': 'c0ffee', 'temp': 76.13}...
Jan 05 22:12:00 setup.heatseeknyc.com docker[988]: ERROR:root:request {"reading": {"time": 1451702301.0, "sensor_name": "0013a20040d847d1", "verification": "c0ffee", "temp": 68.58}} got 500 response {"status":"500","error":"Internal Server Error"}
Jan 05 22:11:51 setup.heatseeknyc.com docker[988]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Jan 05 22:11:51 setup.heatseeknyc.com docker[988]: INFO:root:POSTing {'time': 1451702301.0, 'sensor_name': '0013a20040d847d1', 'verification': 'c0ffee', 'temp': 68.58}...
Jan 05 22:11:50 setup.heatseeknyc.com docker[988]: ERROR:root:request {"reading": {"time": 1451698701.0, "sensor_name": "0013a20040d847d1", "verification": "c0ffee", "temp": 68.0}} got 500 response {"status":"500","error":"Internal Server Error"}
Jan 05 22:11:40 setup.heatseeknyc.com docker[988]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Jan 05 22:11:40 setup.heatseeknyc.com docker[988]: INFO:root:POSTing {'time': 1451698701.0, 'sensor_name': '0013a20040d847d1', 'verification': 'c0ffee', 'temp': 68.0}...
Jan 05 22:11:39 setup.heatseeknyc.com docker[988]: ERROR:root:request {"reading": {"time": 1451698701.0, "sensor_name": "0013a20040d847e3", "verification": "c0ffee", "temp": 76.13}} got 500 response {"status":"500","error":"Internal Server Error"}
…

So http://heatseeknyc.com/readings.json is returning 500s as far as I can tell :sob:

hrldcpr commented 8 years ago

Seems to be working now, do you know what the issue was?

williamcodes commented 8 years ago

Okay I switched to a new API key for WUnderground and that endpoint is now returning 200s. Maybe I pushed a change that made things fail harder than normal when we get rate limited. Either way, looks like the issue is on the app end, not the relay end, closing this issue for now.