opensafely-core / job-server

A server for mediating jobs that can be run in an OpenSAFELY secure environment. q.v. job-runner
https://jobs.opensafely.org
Other
5 stars 11 forks source link

Deprecate the `runjobs` management command #4470

Open iaindillingham opened 4 months ago

iaindillingham commented 4 months ago

The runjobs management command allows us to group jobs by their schedule (daily, weekly, monthly, etc.), and make one call for the schedule rather than one call for each job. Doing so requires us to create a module for each schedule (e.g. jobserver.jobs.daily), and a module for each job (e.g. jobserver.jobs.daily.my_job) with the following structure:

from django_extensions.management.jobs import DailyJob

class Job(DailyJob):
    help = "My daily job"

    def execute(self):
        do_something()

Notice that the schedule is encoded in the module name (jobserver.jobs.daily) and the parent class (DailyJob). It's not clear which component is used by runjobs. Also notice that runjobs doesn't actually run the jobs on the schedule: it's a management command for running the jobs associated with the schedule. We use Dokku managed cron to actually run the jobs on the schedule; we configure it in app.json.

In other words, there are three locations that encode the schedule, but only one that matters. This is confusing.

The overhead of grouping jobs by their schedule might be tolerable, if there were many jobs for each schedule. However, there are not. There is one hourly job, two daily jobs, and one weekly job. All seem like they would make good Django management commands. Indeed, one is a Django management command!

It would be better to convert the jobs to management commands, and call them directly from app.json. This would allow us to deprecate the runjobs management command.

Implementation considerations

Converting the jobs to management commands wouldn't change our approach to monitoring them with Sentry Crons. Rather than wrapping Job.execute with sentry_sdk.crons.monitor, we'd wrap Command.handle.

You may argue that doing so would involve coupling the notion of a management command to the notion of a schedule. However, the runjobs management command does something similar; it couples the notion of a job to the notion of a schedule, because it doesn't actually run the jobs on the schedule.

If we are concerned about coupling, then we could create a management command for running other management commands, and wrap this management command's Command.handle with sentry_sdk.crons.monitor. app.json would look something like this:

{
  "cron": [
    {
      "command": "python manage.py runcrons hourly management_command_1 management_command_2",
      "schedule": "@hourly"
    },
    {
      "command": "python manage.py runcrons daily management_command_3",
      "schedule": "@daily"
    }
  ]
}

(Notice that app.json doesn't have entries it doesn't need: no management commands are run on weekly or monthly schedules.)

This approach may involve creating a decorator. For more information about creating decorators, see Chapter 9 of Fluent Python, Second Edition, "Decorators and Closures". (We can access this chapter using our Oxford SSOs.)

Ideally, we'd configure Sentry Crons alongside Dokku managed cron, but that doesn't seem to be possible.

lucyb commented 3 months ago

This will be a good task for the next round of deck scrubbing. However, we're now starting to focus on the next initiative and won't pick it up for a while, so I'm going to move it off the board.