starcraftman / cogBot

A discord bot for federals!
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Handle Work Pool Crashing #59

Closed starcraftman closed 6 years ago

starcraftman commented 6 years ago

source_traceback: Object created at (most recent call last): File "/home/cog/live_hudson/cog/bot.py", line 358, in main() File "/home/cog/live_hudson/cog/bot.py", line 349, in main bot.run(cog.util.get_config('discord', os.environ.get('COG_TOKEN', 'dev'))) File "/home/cog/.shell/pyenv/versions/3.5.3/lib/python3.5/site-packages/discord/client.py", line 519, in run self.loop.run_until_complete(self.start(*args, **kwargs)) File "/home/cog/.shell/pyenv/versions/3.5.3/lib/python3.5/asyncio/coroutines.py", line 125, in send return self.gen.send(value) File "/home/cog/live_hudson/cog/jobs.py", line 163, in pool_starter asyncio.ensure_future(pool_monitor(live_jobs, fail_cb, delay)) Traceback (most recent call last): File "/home/cog/live_hudson/cog/jobs.py", line 132, in pool_monitor job.future.result(0.01) # Force raising exception File "/home/cog/.shell/pyenv/versions/3.5.3/lib/python3.5/concurrent/futures/_base.py", line 398, in result return self.get_result() File "/home/cog/.shell/pyenv/versions/3.5.3/lib/python3.5/concurrent/futures/_base.py", line 357, in get_result raise self._exception concurrent.futures._base.TimeoutError: ('Task timeout', 15)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "uvloop/future.pyx", line 372, in uvloop.loop.BaseTask._fast_step (uvloop/loop.c:112669) File "/home/cog/.shell/pyenv/versions/3.5.3/lib/python3.5/asyncio/coroutines.py", line 125, in send return self.gen.send(value) File "/home/cog/live_hudson/cog/jobs.py", line 143, in pool_monitor job.check_timeout() File "/home/cog/live_hudson/cog/jobs.py", line 85, in check_timeout self.start() File "/home/cog/live_hudson/cog/jobs.py", line 71, in start
self.future = POOL.schedule(self.func, timeout=self.timeout) File "/home/cog/.shell/pyenv/versions/3.5.3/lib/python3.5/site-packages/pebble/pool/process.py", line 82, in schedule self._check_pool_state() File "/home/cog/.shell/pyenv/versions/3.5.3/lib/python3.5/site-packages/pebble/pool/base_pool.py", line 93, in _check_pool_state raise RuntimeError('Unexpected error within the Pool')

starcraftman commented 6 years ago

Two problems here: 1) If this can crash, I should restart it. Bot now useless without a working Pool of processes. 2) Revealed a secondary problem. Some sync jobs requery objects from the db. If the database is dumped after first attempt, retries of the job will fail to get a copy of the information needed. May need to revisit pickling the db objects or some means of caching.

starcraftman commented 6 years ago

2) is fixed on master. I'm not sure 1) is as urgent. Though a simple patch would probably be to just check check for the runtime error in schedule and if it throws make new pool then delay and reschedule.

starcraftman commented 6 years ago

I've decided I won't do anything about 1 atm. I may change the way the jobs are run though in the future.