Open paulzakin opened 2 months ago
If I understand it correctly, the Procrastinate worker uses its own connection pool independent of the Django connection pool (it just fetches the connection parameters from Django). But I wonder if this error originates from the worker or from within the task and thereby a Django connection (from your traceback: File "django/db/models/query.py", line 400, in __iter__
). Are those very long-running tasks?
Very possibly! But now I’m confused. Are you saying the task itself is executed using a Django connection, rather than a procrastinate one? But not all the tasks are long running and this has only happened now over the past couple of weeks so it is odd for sure. What would happen is every minute a short task would run (10 seconds or less) and if used a dead connection effectively. Maybe there is way within Procrastinate to force the connection to release? The stack trace bubbles it from Django to Procrastinate
Very possibly! But now I’m confused. Are you saying the task itself is executed using a Django connection, rather than a procrastinate one?
Yes, I think so. Procrastinate extracts the connection parameters from a Django connection, but then uses its own connection pool for everything that is worker related. However, when you access the Django ORM in a task, it uses the Django connections (or that pool if it is set up in Django 5.1).
As a sync task is run in its own thread, the Django connections won't be automatically closed (there is no request-response cycle like in a Django view where it is automatically closed). So, from my understanding (and when reading some of the many issues on the web), one has to close them manually. That's why we call django.db.close_old_connections()
in our tasks before an expected long-running calculation so that when the connections are re-opened when needed again. As stated in #1134 CONN_MAX_AGE
may work, too. Procrastinate can't do anything about that as it has no information on when those Django connections are needed and when they are not.
@ewjoachim When Procrastinate fetches the connection parameters from Django it creates a new (Django) connection. I wonder if we should explicitly close it after we got the parameters. I don't think it has anything to do with the reported issue, but it could still be problematic as this connection may also time out.
This might be a solid lead.
I dug a bit deeper, but I was wrong. When the connection parameters are fetched from Django, no connection is opened. Contrary to the name create_connection,
it just creates a database wrapper that fetches the database parameters from the Django settings (that's why its named wrapper
in Procrastinate 😂).
@paulzakin I still recommend manually closing those Django connections. Please report back if that issue nevertheless occurs.
Huh! Ok. Weird. The pool in Django I thought would do that. It's a new feature though - maybe it is a bit buggy on there end. Would there be a good place within Procrastinate to close those connections you guys think?
Or should I try to intercept them somewhere else
Huh! Ok. Weird. The pool in Django I thought would do that. It's a new feature though - maybe it is a bit buggy on there end. Would there be a good place within Procrastinate to close those connections you guys think?
Or should I try to intercept them somewhere else
I can't say much about the new pool feature, but it looks like the errors we had to deal with (without using the pool setting). Those entirely disappeared when we manually closed the connections at specific points in our task. But I also saw some code somewhere where they caught the error, closed the connection, and retried again.
I also wonder how the Django pool feature is related to CONN_MAX_AGE
and CONN_HEATH_CHECKS
(see "Connection management". The main problem is that Django database connections are built around the request-response cycle, which is not present inside a background task.
When I find time, I can play a bit with long-running tasks that access the Django ORM (with and without the pool option). Do you use the default pool ("pool": True
) or pass in you own pool?
No rush, and thanks a bunch! We use the default pool: True - nothing fancy. We can do the close function at the end of each long running task, which makes sense - but I do feel this is something other Procrastinate users will stumble into for sure. Probably worth having some logic in Procrastinate that when raise e.OperationalError("the connection is closed") by Procrastinate, it closes the Django connection as well? That might be what we end up implementing if it happens a bunch more.
Hey @ewjoachim and @medihack an interesting error we got from 2 AM to 10 AM this morning. OperationalError: the connection is closed. Here is the stack trace (below). I think it might be related to #1134. We have been using the pool feature in Django 5.1 for a couple of weeks now (and
python manage.py procrastinate worker
) but this is the first time this real happended. What happended, as far as I can tell, is that the worker kept reusing a dead connection for some reason. So every X seconds that a task was run, this error would be thrown. But I think it is even more mysterious then that, because some of the tasks succeeded, indictating that one of the worker processes had a good connection and the other had a bad one (we run two processes on the same server). The solution was simply kill those two workers and start two new ones, and that solved the problem.So, is there any way that when that error is thrown, Procrastinate can "get" a new connection from the pool rather than simply throwing an error to avoid this problem? I was under the impression that Procrastinate manages the pool for the worker, not Django, so this seems doable to me?