Open networkjanitor opened 2 years ago
I think this is also what https://github.com/arteria/django-background-tasks/issues/151 is about, but that issue does not have any description and is four years old by now.
Hi, thanks for the detailed issue. Yes there are a number of issues with django-background-tasks
which cause knock-on issues. There is a (long running) process to replace django-background-tasks
with celery
, the hold-up being it's a lot of work to switch without breaking anyones existing tasks and install as I need to basically port over all existing tasks to celery using and a heartbeat. This is also holding up fixing a load of the heavy tasks (deleting a big source causing a 503/timeout etc.) to background tasks. For an immediate work-around, you can use the reset-tasks
command line option, this will delete all the existing (including stale) django-background-tasks
tasks and re-create any as needed as new tasks.
For an immediate work-around, you can use the reset-tasks command line option, this will delete all the existing (including stale) django-background-tasks tasks and re-create any as needed as new tasks.
That's exactly what I have been using (in crontab by now), but it's hardly working. Right now the taskqueue usually deadlocks while reset-tasks
is still running, if not shortly after reset-tasks
finishes, leaving behind ~2500 unprocessed tasks. Which means per handful of newly processed tasks (i.e. metadata/data download), it needs to re-index all sources again and again.
Ah, fair enough. Unfortunately it's unlikely I'm going to be able to support a not updated upstream package to resolve this bug directly. I'd not heard of this particular issue before and not one where reset-tasks
didn't at least temporarily fix it. Relying on it on cron should obviously be overkill, but seems required for some installations sadly. I'll update this issue with details on the migration to celery though if you like.
For now I'm using the following new command (placed at /app/sync/management/commands/unlock-tasks.py
) to unlock (by deletion and recreation) any locked tasks by dead processes:
from django.core.management.base import BaseCommand
from django.utils.translation import gettext_lazy as _
from django.utils import timezone
from background_task.models import Task
from common.logger import log
class Command(BaseCommand):
help = 'Unlocks jobs, which are locked by dead processes.'
def handle(self, *args, **options):
log.info('Fetching locked tasks...')
# Iter all locked tasks
locked_tasks = Task.objects.locked(timezone.now())
for locked_task in locked_tasks:
if locked_task.locked_by_pid_running():
log.info(f'Tasks is still fine: {locked_task}')
else:
log.info(f'Tasks needs unlocking: {locked_task}')
if locked_task.is_repeating_task():
# Deleting repeating tasks is bad (would result in no more source
# index tasks), so we schedule them before deletion.
# This matches the flow in the bg_runner (w/o signals).
log.info(f'Re-schedule repeating task: {locked_task}')
locked_task.create_repetition()
# Deleting a locked, non-repeating task should not be too bad,
# as it should be re-created on the source index task again.
log.info(f'Deleting task: {locked_task}')
locked_task.delete()
## Unlocking by setting locked_by and locked_at to None
## results in source index tasks being repeated over and over,
## since those seem to be the ones crashing process_tasks for me.
## Re-execution of the same source index task results in duplicate
## download tasks being scheduled, without them ever being executed
#locked = locked_tasks.filter(pk=locked_task.pk)
#locked.update(locked_by=None, locked_at=None)
log.info('Done')
This really seems like some last resort workaround, as I'm deleting the non-repeating tasks instead of rescheduling them. It seems to work so far and it also seems like only the (repeating) source index tasks were crashing the process_tasks
process. I'll observe this for a few days and note anything else that comes to mind - just in case someone else stumbles across this issue (and can't solve it by setting TZ
correctly, switching away from sqlite or using reset-tasks
to fix it), at least until the celery replacement is ready :)
Thanks for the command for anyone else who stumbles over these sorts of issues. Most helpful!
@networkjanitor Good job, thank you!
@meeb should @networkjanitor 's command be added to the repo?
Situation:
Sometimes the
process_tasks
process gets killed/exits (for whatever reason) and then is restarted by s6. However, if the original process had locked tasks in the db, then these do not get unlocked, not on restart and not after they are expired. Should the amount of locked 'dead' tasks be equal toBACKGROUND_TASK_ASYNC_THREADS
, then the whole task queue is deadlocked until tasks are reset.As far as I can tell, this is because of a bug in
django-background-tasks
, related to: https://github.com/arteria/django-background-tasks/issues/239When using
BACKGROUND_TASK_ASYNC_THREADS
tasks never expire as per the linked issue. So even if there are locked tasks in the db (by processes which do not longer exist), they wont get unlocked. Same for tasks running longer thanMAX_RUN_TIME
(thats what the linked issue is actually about).Ideas to solve this or work around this:
django-background-tasks
or patch it in the version you are using (since it is unmaintained)task.locked_by_pid_running()
and unlocks them, should it return false.