procrastinate-org / procrastinate

PostgreSQL-based Task Queue for Python
https://procrastinate.readthedocs.io/
MIT License
840 stars 52 forks source link

Use additional abort field on jobs table for abortion requests #1139

Closed medihack closed 1 month ago

medihack commented 1 month ago

Closes #1136

Successful PR Checklist:

PR label(s):

onlyann commented 1 month ago

Looking good. What about naming the column abort_requested?

Alternatively, it could be a nullable timestamp abort_requested_at that can be inferred as a flag but also conveys the time it is requested in the job table.

medihack commented 1 month ago

Looking good. What about naming the column abort_requested?

Yes, that sounds good. I renamed it to abort_requested.

Alternatively, it could be a nullable timestamp abort_requested_at that can be inferred as a flag but also conveys the time it is requested in the job table.

For the sake of simplicity, I would keep it a boolean flag. I make sure that the event of an abortion request is recorded (and the event itself has a timestamp). Are you okay with that?

github-actions[bot] commented 1 month ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  procrastinate
  job_context.py
  manager.py
  testing.py
  worker.py
  procrastinate/contrib/django
  models.py
Project Total  

This report was generated by python-coverage-comment-action

medihack commented 1 month ago

Some design decisions:

I will switch the branch to merge against when we have a new branch for our beta.

onlyann commented 1 month ago

The migration becomes quite long as it is not possible to remove a value (in this case, aborting) from an enum type. A new enum type must be created, the old one must be swapped out, and all dependencies (indexes, functions, trigger) must be recreated. I tried to add good comments to make things clear.

That is the main drawback of postgres enum. They are not designed to have their values deleted

I haven't intentionally made abort_requested available on the Job itself because if the user checks job.abort_requested of a passed context inside a task (in contrast to calling should_abort), this information is not up-to-date.

And as long as the Job class is marked as frozen, we can't simply change the flag.

Here is a thought on how the should_abort could work:

The task doesn't call the database through the JobManager. Instead:

The worker maintains a set of the job ids that should be aborted. The JobContext exposes a function that returns True if the job id is in that set.

This means the tasks can check through the JobContext without introducing database overhead.

For async task, the worker could call cancel on the asyncio Task. The user code can then handle CancellationError to control abortion.

medihack commented 1 month ago

Here is a thought on how the should_abort could work:

The task doesn't call the database through the JobManager. Instead:

  • a side task from the worker polls the database at a configured interval for any job that should be aborted (and it would only be interested in the jobs that it currently runs)
  • additionally, a listener is notified every time a job is requested to be aborted which provides low latency.

The worker maintains a set of the job ids that should be aborted. The JobContext exposes a function that returns True if the job id is in that set.

This means the tasks can check through the JobContext without introducing database overhead.

For async task, the worker could call cancel on the asyncio Task. The user code can then handle CancellationError to control abortion.

Yes, that all sounds reasonable. But let's start by getting this PR (as it is) in the v3 branch, then the worker refactoring, and then start from there. At least we have a solid base, then. @ewjoachim Do you agree?

ewjoachim commented 1 month ago

I agree. @onlyann your proposal seems to match what #1084 describes.

ewjoachim commented 1 month ago

(is the PR ready for review after the worker refactor ?)

medihack commented 1 month ago

(is the PR ready for review after the worker refactor ?)

Yes, I think so.