ybrs / single-beat

ensures only one instance of your process across your servers
MIT License
173 stars 35 forks source link

Fence token not always set #20

Closed timthelion closed 5 years ago

timthelion commented 5 years ago

I'm running on Heroku.

2019-01-13T16:20:08.876536+00:00 app[worker.1]: ERROR:tornado.application:Exception in callback <bound method Process.timer_cb of <singlebeat.beat.Process object at 0x7f2d46a19080>>
2019-01-13T16:20:08.876546+00:00 app[worker.1]: Traceback (most recent call last):
2019-01-13T16:20:08.876552+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/tornado/ioloop.py", line 1229, in _run
2019-01-13T16:20:08.876554+00:00 app[worker.1]: return self.callback()
2019-01-13T16:20:08.876555+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/singlebeat/beat.py", line 139, in timer_cb
2019-01-13T16:20:08.876557+00:00 app[worker.1]: fn()
2019-01-13T16:20:08.876559+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/singlebeat/beat.py", line 120, in timer_cb_running
2019-01-13T16:20:08.876560+00:00 app[worker.1]: redis_fence_token = rds.get("SINGLE_BEAT_{identifier}".format(identifier=self.identifier)).split(b":")[0]
2019-01-13T16:20:08.876752+00:00 app[worker.1]: AttributeError: 'NoneType' object has no attribute 'split'
~ $ pip3 show single-beat
Name: single-beat
Version: 0.3.1
Summary: ensures only one instance of your process across your servers
Home-page: https://github.com/ybrs/single-beat
Author: None
Author-email: None
License: UNKNOWN
Location: /app/.heroku/python/lib/python3.6/site-packages
Requires: redis, tornado
timthelion commented 5 years ago

@joekohlsdorf https://github.com/ybrs/single-beat/pull/18

joekohlsdorf commented 5 years ago

Can you please post a log of a run with SINGLE_BEAT_LOG_LEVEL=debug?

timthelion commented 5 years ago

I'll try. I got the error only after running a celery task and not on startup. But I just loaded up everything again and did not reproduce the bug :/

joekohlsdorf commented 5 years ago

I guess what is happening here is that your task is taking up so much CPU time that the lock expires in Redis because it doesn't get updated in time. This is exactly what the the fencing feature is meant for. single-beat should obviously exit cleanly instead of failing with an exception, I'll fix this.

You can probably mitigate this by raising SINGLE_BEAT_LOCK_TIME, the default value is 5.

You need to use a supervisor to restart single-beat in case it exits, please see the example configuration which is linked in the README.