procrastinate-org / procrastinate

PostgreSQL-based Task Queue for Python
https://procrastinate.readthedocs.io/
MIT License
859 stars 53 forks source link

Django database errors not properly wrapped #1141

Closed katlyn closed 2 months ago

katlyn commented 2 months ago

When using procrastinate within a Django application, it appears that database errors from Django are not wrapped correctly in all cases, particularly IntegrityErrors raised from queueing_lock. I believe the fix should be along the lines updating /procrastinate/contrib/django/djago_connector.py#44 to be similar to the following.

try:
    yeild
except django.db.IntegrityError as exc:
    raise exceptions.UniqueViolation(*exc.args)

Unfortunately, as far as I can tell there's not a reliable way to extract constraint_name or other information, and I don't have the bandwidth to try to attempt to test and implement a full solution at this point in time.

Steps to reproduce:

  1. Set up procrastinate with a Django project.
  2. Create a task with queueing_lock set.
  3. Attempt to queue multiple jobs at a time.
  4. Django's IntegrityError is raised instead of AlreadyEnqueued.

Workaround

Use except IntegrityError instead of AlreadyEnqueued.

paulzakin commented 2 months ago

Yup, we get that as well. @ewjoachim is there a way for it to fail gracefully without throwing an exception? Basically, say you have two worker processes. What happens is at some point Task #1 takes to long and process #1 stuck doing it. And process #2 is now freed up and tries to do another copy of Task #1 - it gets an IntegrityError and shuts down. The container running Procrastinate process #2 restarts, tries again in like five seconds, and basically does this until process #1 stops. I don't think anyone's code is wrong - it just seems like there is probably a better solution.

ewjoachim commented 2 months ago

Oof, it's much worse than I thought, sorry.

We should fix it, I don't think there's a user workaround.

paulzakin commented 2 months ago

No worries - it's not a big deal operationally for us!

davecoates commented 2 months ago

I don't think you can extract the constraint name directly from the IntegrityError, but could do something like

with wrap():
    try:
        yield
    except IntegrityError as exc:
        constraint_name = None
        if QUEUEING_LOCK_CONSTRAINT in str(exc):
            constraint_name = QUEUEING_LOCK_CONSTRAINT
        raise exceptions.UniqueViolation(*exc.args, constraint_name=constraint_name)

which seems to work

ewjoachim commented 2 months ago

I don't think you can extract the constraint name directly from the IntegrityError

@davecoates yes we can: django wraps the exception (pseudocode):

try:
     psycopg.stuff()
except psycopg.Exception as exc:
    raise DjangoIntegrityError from exc

Which means we get the original exception in __cause__

try:
    django.cursor.stuff()
except DjangoIntegrityError as exc:
    assert isinstance(DjangoIntegrityError.__cause__, DjangoIntegrityError)

so we can do exc.__cause__.diag.constraint_name

davecoates commented 2 months ago

Ahh right - so could just do this then?

with wrap():
    try:
        yield
    except DjangoIntegrityError as exc:
        # raise the original UniqueViolation
        raise exc.__cause__
ewjoachim commented 2 months ago

Ahh right - so could just do this then?

That's what we did precisley

Also, https://github.com/procrastinate-org/procrastinate/releases/tag/2.13.1 is released with the fix.

davecoates commented 2 months ago

Brilliant - thanks for the quick release