Closed williamcodes closed 8 years ago
Yeah weird, it seems to be dying and printing out its name, e.g.:
Dec 17 23:20:46 setup.heatseeknyc.com docker[9341]: batch
The 404s and 400s should be fine, so it seems more like a weird docker or coreos or memory issue, perhaps related to why the other units are dying, I'll try to investigate soon.
Issue #16 was a duplicate of this. It appears that hub 2506 wasn't the only one that the app server wasn't receiving readings for. Restarting the relay server has restarted transmission for now. That trick seems to only keep us up for about a day though.
Restarting the relay server is no longer causing all the readings to transmit. Here are some examples of hubs with unrelayed readings:
http://relay.heatseeknyc.com/hubs/0013a20040dabe04 http://relay.heatseeknyc.com/hubs/0013a20040dc2744 http://relay.heatseeknyc.com/hubs/0013a20040dc272d
How can we fix this?
$ journalctl -ru batch
-- Logs begin at Sat 2015-12-12 02:52:27 UTC, end at Tue 2016-01-05 22:12:02 UTC. --
Jan 05 22:12:01 setup.heatseeknyc.com docker[988]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Jan 05 22:12:01 setup.heatseeknyc.com docker[988]: INFO:root:POSTing {'time': 1451702301.0, 'sensor_name': '0013a20040d847e3', 'verification': 'c0ffee', 'temp': 76.13}...
Jan 05 22:12:00 setup.heatseeknyc.com docker[988]: ERROR:root:request {"reading": {"time": 1451702301.0, "sensor_name": "0013a20040d847d1", "verification": "c0ffee", "temp": 68.58}} got 500 response {"status":"500","error":"Internal Server Error"}
Jan 05 22:11:51 setup.heatseeknyc.com docker[988]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Jan 05 22:11:51 setup.heatseeknyc.com docker[988]: INFO:root:POSTing {'time': 1451702301.0, 'sensor_name': '0013a20040d847d1', 'verification': 'c0ffee', 'temp': 68.58}...
Jan 05 22:11:50 setup.heatseeknyc.com docker[988]: ERROR:root:request {"reading": {"time": 1451698701.0, "sensor_name": "0013a20040d847d1", "verification": "c0ffee", "temp": 68.0}} got 500 response {"status":"500","error":"Internal Server Error"}
Jan 05 22:11:40 setup.heatseeknyc.com docker[988]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Jan 05 22:11:40 setup.heatseeknyc.com docker[988]: INFO:root:POSTing {'time': 1451698701.0, 'sensor_name': '0013a20040d847d1', 'verification': 'c0ffee', 'temp': 68.0}...
Jan 05 22:11:39 setup.heatseeknyc.com docker[988]: ERROR:root:request {"reading": {"time": 1451698701.0, "sensor_name": "0013a20040d847e3", "verification": "c0ffee", "temp": 76.13}} got 500 response {"status":"500","error":"Internal Server Error"}
…
So http://heatseeknyc.com/readings.json is returning 500s as far as I can tell :sob:
Seems to be working now, do you know what the issue was?
Okay I switched to a new API key for WUnderground and that endpoint is now returning 200s. Maybe I pushed a change that made things fail harder than normal when we get rate limited. Either way, looks like the issue is on the app end, not the relay end, closing this issue for now.
This appears to have affected all cells, I checked 6 and they all had unrelayed readings. I restarted the relay server and it started transmitting again. The first reading came in at 2015-12-18 13:54:11 +0000. It looks like we're back up to current. What could have caused it?
It can't be a duplicates issue since that's been fixed already and because the app logs aren't full of requests causing 400s. In fact the app logs shows the last reading being posted at 2015-12-17 23:20:45 +0000. It was a 404 for sensor 26D0. Looks like there are a good number of 404s actually. Perhaps they're causing the same problem that the 400s were causing?