jcass77 / django-apscheduler

APScheduler for Django
MIT License
669 stars 97 forks source link
apscheduler django

Django APScheduler

PyPI PyPI - Python Version PyPI - Django Version GitHub Workflow Status Codecov Code style:black

APScheduler for Django.

This is a Django app that adds a lightweight wrapper around APScheduler. It enables storing persistent jobs in the database using Django's ORM.

django-apscheduler is a great choice for quickly and easily adding basic scheduling features to your Django applications with minimal dependencies and very little additional configuration. The ideal use case probably involves running a handful of tasks on a fixed execution schedule.

PLEASE NOTE: the trade-off of this simplicity is that you need to be careful to ensure that you have only ONE scheduler actively running at a particular point in time.

This limitation is due to the fact that APScheduler does not currently have any interprocess synchronization and signalling scheme that would enable the scheduler to be notified when a job has been added, modified, or removed from a job store (in other words, different schedulers won't be able to tell if a job has already been run by another scheduler, and changing a job's scheduled run time directly in the database does nothing unless you also restart the scheduler).

Depending on how you are currently doing your Django deployments, working with this limitation might require a bit of thought. It is quite common to start up many webserver worker process in production environments in order to scale and handle large volumes of user traffic. If each of these worker processes end up running their own scheduler then this can result in jobs being missed or executed multiple times, as well as duplicate entries being created in the DjangoJobExecution table.

Support for sharing a persistent job store between multiple schedulers appears to be planned for an upcoming APScheduler 4.0 release.

So for now your options are to either:

  1. Use a custom Django management command to start a single scheduler in its own dedicated process (recommended - see the runapscheduler.py example below); or

  2. Implement your own remote processing logic to ensure that a single DjangoJobStore can be used by all of the webserver's worker processes in a coordinated and synchronized way (might not be worth the extra effort and increased complexity for most use cases); or

  3. Select an alternative task processing library that does support inter-process communication using some sort of shared message broker like Redis, RabbitMQ, Amazon SQS or the like (see: https://djangopackages.org/grids/g/workers-queues-tasks/ for popular options).

Features

Installation

pip install django-apscheduler

Quick start

Maximum run time allowed for jobs that are triggered manually via the Django admin site, which

prevents admin site HTTP requests from timing out.

Longer running jobs should probably be handed over to a background task processing library

that supports multiple background worker processes instead (e.g. Dramatiq, Celery, Django-RQ,

etc. See: https://djangopackages.org/grids/g/workers-queues-tasks/ for popular options).

APSCHEDULER_RUN_NOW_TIMEOUT = 25 # Seconds


- Run `python manage.py migrate` to create the django_apscheduler models.

- Add a [custom Django management command](https://docs.djangoproject.com/en/dev/howto/custom-management-commands/) to your project
  that schedules the APScheduler jobs and starts the scheduler:

```python
# runapscheduler.py
import logging

from django.conf import settings

from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.cron import CronTrigger
from django.core.management.base import BaseCommand
from django_apscheduler.jobstores import DjangoJobStore
from django_apscheduler.models import DjangoJobExecution
from django_apscheduler import util

logger = logging.getLogger(__name__)

def my_job():
  # Your job processing logic here...
  pass

# The `close_old_connections` decorator ensures that database connections, that have become
# unusable or are obsolete, are closed before and after your job has run. You should use it
# to wrap any jobs that you schedule that access the Django database in any way. 
@util.close_old_connections
def delete_old_job_executions(max_age=604_800):
  """
  This job deletes APScheduler job execution entries older than `max_age` from the database.
  It helps to prevent the database from filling up with old historical records that are no
  longer useful.

  :param max_age: The maximum length of time to retain historical job execution records.
                  Defaults to 7 days.
  """
  DjangoJobExecution.objects.delete_old_job_executions(max_age)

class Command(BaseCommand):
  help = "Runs APScheduler."

  def handle(self, *args, **options):
    scheduler = BlockingScheduler(timezone=settings.TIME_ZONE)
    scheduler.add_jobstore(DjangoJobStore(), "default")

    scheduler.add_job(
      my_job,
      trigger=CronTrigger(second="*/10"),  # Every 10 seconds
      id="my_job",  # The `id` assigned to each job MUST be unique
      max_instances=1,
      replace_existing=True,
    )
    logger.info("Added job 'my_job'.")

    scheduler.add_job(
      delete_old_job_executions,
      trigger=CronTrigger(
        day_of_week="mon", hour="00", minute="00"
      ),  # Midnight on Monday, before start of the next work week.
      id="delete_old_job_executions",
      max_instances=1,
      replace_existing=True,
    )
    logger.info(
      "Added weekly job: 'delete_old_job_executions'."
    )

    try:
      logger.info("Starting scheduler...")
      scheduler.start()
    except KeyboardInterrupt:
      logger.info("Stopping scheduler...")
      scheduler.shutdown()
      logger.info("Scheduler shut down successfully!")

Advanced usage

django-apscheduler assumes that you are already familiar with APScheduler and its proper use. If not, then please head over to the project page and have a look through the APScheduler documentation.

It is possible to make use of different types of schedulers depending on your environment and use case. If you would prefer running a BackgroundScheduler instead of using a BlockingScheduler, then you should be aware that using APScheduler with uWSGI requires some additional configuration steps in order to re-enable threading support.

Supported databases

Please take note of the list of databases that are officially supported by Django. django-apscheduler probably won't work with unsupported databases like Microsoft SQL Server, MongoDB, and the like.

Database connections and timeouts

django-apscheduler is dependent on the standard Django database configuration settings. These settings, in combination with how your database server has been configured, determine how connection management will be performed for your specific deployment.

The close_old_connections decorator should be applied to APScheduler jobs that require database access. Doing so ensures that Django's CONN_MAX_AGE configuration setting is enforced before and after your job is run. This mirrors the standard Django functionality of doing the same before and after handling each HTTP request.

If you still encounter any kind of 'lost database connection' errors then it probably means that:

Common footguns

Unless you have a very specific set of requirements, and have intimate knowledge of the inner workings of APScheduler, you really shouldn't be using BackgroundScheduler. Doing so can lead to all sorts of temptations like:

Relying on BlockingScheduler forces you to run APScheduler in its own dedicated process that is not handled or monitored by the webserver. The example code provided in runapscheduler.py above is a good starting point.

Project resources