heatseeknyc / relay

web app for setting up sensors, receiving and storing their data, and viewing it
0 stars 2 forks source link

batch service crashing periodically #22

Closed williamcodes closed 8 years ago

williamcodes commented 8 years ago

Here are some logs from the last time this happened. Perhaps they will be useful in determining the cause.

2016-01-11-web-batch.log 2016-01-11-journalctl-web.log

williamcodes commented 8 years ago

Harold fixed this issue several weeks ago, I think through some kind of configuration change with Docker or CoreOS. This issue appears to have resurfaced recently though.

hrldcpr commented 8 years ago

Are you sure it's crashing? It appears to have been running fine for quite a while.

One possible issue is that it's receiving 400s from the main server for "No user associated with that sensor", which means it keeps retrying, whereas if it returns a 404, which I think it used to, then it will give up for that cell. So maybe change that response code (back?) to 404? Otherwise it might fall behind on updates because it spends so long on the 400s.

Specifically I'm seeing this a lot for cell 0013a20040d848d1 but maybe there's others.

williamcodes commented 8 years ago

Yeah it was down from the 15th to the 20th, below is the relevant section of logs.

Thanks for the heads-up about 0013a20040d848d1, I marked all its readings as relay='f'.

Sometimes we set up a sensor in someone's apartment and they don't want to create an account, so we make an account for them the next day at the office. If we return 404s for "no user associated", then the backed-up readings won't come through when the user finally does get created and associated with that cell.

One option would be to set a systemd job to restart the process nightly. We could also expose an endpoint on the relay that allows the app to ask for retries rather than retrying automatically forever. Or we could keep digging and try to get to the bottom of why the process dies. There are probably other strategies I'm not thinking of as well.

Also, the web service crashed and is down at the moment, I'll leave it down for a while to give you a chance to investigate.

Mar 20 22:36:58 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 46.5, 'verification': 'c0ffee', 'time': 1443380737.0, 'sensor_name': '0013a20040dabd02'}...
Mar 20 22:36:57 setup.heatseeknyc.com docker[26477]: ERROR:root:request {"reading": {"temp": 48.8, "verification": "c0ffee", "time": 1443380725.0, "sensor_name": "0013a20040dabd02"}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 20 22:36:57 setup.heatseeknyc.com docker[26477]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 20 22:36:57 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 48.8, 'verification': 'c0ffee', 'time': 1443380725.0, 'sensor_name': '0013a20040dabd02'}...
Mar 20 22:36:56 setup.heatseeknyc.com docker[26477]: ERROR:root:request {"reading": {"temp": 19.2, "verification": "c0ffee", "time": 1443380527.0, "sensor_name": "0013a20040dabd12"}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 20 22:36:56 setup.heatseeknyc.com docker[26477]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 20 22:36:56 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 19.2, 'verification': 'c0ffee', 'time': 1443380527.0, 'sensor_name': '0013a20040dabd12'}...
Mar 20 22:36:55 setup.heatseeknyc.com docker[26477]: ERROR:root:request {"reading": {"temp": 21.0, "verification": "c0ffee", "time": 1443380523.0, "sensor_name": "0013a20040dabd12"}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 20 22:36:55 setup.heatseeknyc.com docker[26477]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 20 22:36:55 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 21.0, 'verification': 'c0ffee', 'time': 1443380523.0, 'sensor_name': '0013a20040dabd12'}...
Mar 20 22:36:54 setup.heatseeknyc.com docker[26477]: ERROR:root:request {"reading": {"temp": 28.5, "verification": "c0ffee", "time": 1443380515.0, "sensor_name": "0013a20040dabd12"}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 20 22:36:54 setup.heatseeknyc.com docker[26477]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 20 22:36:54 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 28.5, 'verification': 'c0ffee', 'time': 1443380515.0, 'sensor_name': '0013a20040dabd12'}...
Mar 20 22:36:53 setup.heatseeknyc.com docker[26477]: ERROR:root:request {"reading": {"temp": 38.4, "verification": "c0ffee", "time": 1443380507.0, "sensor_name": "0013a20040dabd12"}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 20 22:36:53 setup.heatseeknyc.com docker[26477]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 20 22:36:53 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 38.4, 'verification': 'c0ffee', 'time': 1443380507.0, 'sensor_name': '0013a20040dabd12'}...
Mar 20 22:36:52 setup.heatseeknyc.com docker[26477]: ERROR:root:request {"reading": {"temp": 34.9, "verification": "c0ffee", "time": 1443380499.0, "sensor_name": "0013a20040dabd12"}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 20 22:36:52 setup.heatseeknyc.com docker[26477]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 20 22:36:52 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 34.9, 'verification': 'c0ffee', 'time': 1443380499.0, 'sensor_name': '0013a20040dabd12'}...
Mar 20 22:36:51 setup.heatseeknyc.com docker[26477]: ERROR:root:request {"reading": {"temp": 21.0, "verification": "c0ffee", "time": 1443380487.0, "sensor_name": "0013a20040dabd12"}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 20 22:36:51 setup.heatseeknyc.com docker[26477]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 20 22:36:51 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 21.0, 'verification': 'c0ffee', 'time': 1443380487.0, 'sensor_name': '0013a20040dabd12'}...
Mar 20 22:36:50 setup.heatseeknyc.com docker[26477]: ERROR:root:request {"reading": {"temp": 19.8, "verification": "c0ffee", "time": 1443380479.0, "sensor_name": "0013a20040dabd12"}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 20 22:36:49 setup.heatseeknyc.com docker[26477]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 20 22:36:49 setup.heatseeknyc.com docker[26477]: INFO:root:POSTing {'temp': 19.8, 'verification': 'c0ffee', 'time': 1443380479.0, 'sensor_name': '0013a20040dabd12'}...
Mar 20 22:36:49 setup.heatseeknyc.com docker[26477]: INFO:root:3192 unrelayed temperatures
Mar 20 22:36:48 setup.heatseeknyc.com systemd[1]: Started batch.service.
Mar 20 22:36:48 setup.heatseeknyc.com docker[26453]: 23198efbff89a10794a0b3d013fb02bcd13ad73e8bcb3fa71b2d64acbc8fa9ed
Mar 20 22:36:48 setup.heatseeknyc.com docker[26449]: batch
Mar 20 22:36:48 setup.heatseeknyc.com systemd[1]: Starting batch.service...
Mar 20 22:36:48 setup.heatseeknyc.com systemd[1]: Stopped batch.service.
Mar 15 23:46:49 setup.heatseeknyc.com docker[5318]: batch
Mar 15 23:46:38 setup.heatseeknyc.com docker[624]: ERROR:root:request {"reading": {"verification": "c0ffee", "temp": 66.8, "sensor_name": "0013a20040d848d1", "time": 1449354934.0}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 15 23:46:38 setup.heatseeknyc.com docker[624]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 15 23:46:38 setup.heatseeknyc.com docker[624]: INFO:root:POSTing {'verification': 'c0ffee', 'temp': 66.8, 'sensor_name': '0013a20040d848d1', 'time': 1449354934.0}...
Mar 15 23:46:37 setup.heatseeknyc.com docker[624]: ERROR:root:request {"reading": {"verification": "c0ffee", "temp": 66.8, "sensor_name": "0013a20040d848d1", "time": 1449351334.0}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 15 23:46:37 setup.heatseeknyc.com docker[624]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 15 23:46:37 setup.heatseeknyc.com docker[624]: INFO:root:POSTing {'verification': 'c0ffee', 'temp': 66.8, 'sensor_name': '0013a20040d848d1', 'time': 1449351334.0}...
Mar 15 23:46:36 setup.heatseeknyc.com docker[624]: ERROR:root:request {"reading": {"verification": "c0ffee", "temp": 67.4, "sensor_name": "0013a20040d848d1", "time": 1449344134.0}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 15 23:46:36 setup.heatseeknyc.com docker[624]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 15 23:46:36 setup.heatseeknyc.com docker[624]: INFO:root:POSTing {'verification': 'c0ffee', 'temp': 67.4, 'sensor_name': '0013a20040d848d1', 'time': 1449344134.0}...
Mar 15 23:46:35 setup.heatseeknyc.com docker[624]: ERROR:root:request {"reading": {"verification": "c0ffee", "temp": 68.0, "sensor_name": "0013a20040d848d1", "time": 1449340534.0}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 15 23:46:35 setup.heatseeknyc.com docker[624]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 15 23:46:35 setup.heatseeknyc.com docker[624]: INFO:root:POSTing {'verification': 'c0ffee', 'temp': 68.0, 'sensor_name': '0013a20040d848d1', 'time': 1449340534.0}...
Mar 15 23:46:34 setup.heatseeknyc.com docker[624]: ERROR:root:request {"reading": {"verification": "c0ffee", "temp": 68.6, "sensor_name": "0013a20040d848d1", "time": 1449336933.0}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 15 23:46:34 setup.heatseeknyc.com docker[624]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 15 23:46:34 setup.heatseeknyc.com docker[624]: INFO:root:POSTing {'verification': 'c0ffee', 'temp': 68.6, 'sensor_name': '0013a20040d848d1', 'time': 1449336933.0}...
Mar 15 23:46:33 setup.heatseeknyc.com docker[624]: ERROR:root:request {"reading": {"verification": "c0ffee", "temp": 69.2, "sensor_name": "0013a20040d848d1", "time": 1449333334.0}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 15 23:46:33 setup.heatseeknyc.com docker[624]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
Mar 15 23:46:33 setup.heatseeknyc.com docker[624]: INFO:root:POSTing {'verification': 'c0ffee', 'temp': 69.2, 'sensor_name': '0013a20040d848d1', 'time': 1449333334.0}...
Mar 15 23:46:32 setup.heatseeknyc.com docker[624]: ERROR:root:request {"reading": {"verification": "c0ffee", "temp": 68.6, "sensor_name": "0013a20040d848d1", "time": 1449329733.0}} got 400 response {"code":400,"error":"No user associated with that sensor"}
Mar 15 23:46:32 setup.heatseeknyc.com docker[624]: INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): heatseeknyc.com
hrldcpr commented 8 years ago

Oh okay thanks, I must not have looked far enough back in the logs.

It looks like docker/docker#20327 is still the problem, and the "fix" I rolled out earlier doesn't help because it only restarts things on failure, but this isn't considered a failure :sob: I think I can just have it restart no matter what though, which for us should be fine…

hrldcpr commented 8 years ago

Hopefully that will fix it, lemme know if there are other issues! :sob:

williamcodes commented 8 years ago

Awesome, thanks Harold!!