OperationalError: the connection is closed

paulzakin commented 2 months ago

Hey @ewjoachim and @medihack an interesting error we got from 2 AM to 10 AM this morning. OperationalError: the connection is closed. Here is the stack trace (below). I think it might be related to #1134. We have been using the pool feature in Django 5.1 for a couple of weeks now (and python manage.py procrastinate worker) but this is the first time this real happended. What happended, as far as I can tell, is that the worker kept reusing a dead connection for some reason. So every X seconds that a task was run, this error would be thrown. But I think it is even more mysterious then that, because some of the tasks succeeded, indictating that one of the worker processes had a good connection and the other had a bad one (we run two processes on the same server). The solution was simply kill those two workers and start two new ones, and that solved the problem.

So, is there any way that when that error is thrown, Procrastinate can "get" a new connection from the pool rather than simply throwing an error to avoid this problem? I was under the impression that Procrastinate manages the pool for the worker, not Django, so this seems doable to me?

OperationalError: the connection is closed
  File "django/db/backends/base/base.py", line 298, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "django/db/backends/postgresql/base.py", line 433, in create_cursor
    cursor = self.connection.cursor()
  File "psycopg/connection.py", line 213, in cursor
    self._check_connection_ok()
  File "psycopg/_connection_base.py", line 524, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
OperationalError: the connection is closed
  File "procrastinate/worker.py", line 281, in run_job
    task_result = await await_func(*job_args, **job.task_kwargs)
  File "server/blueprint.py", line 80, in wrapper
    raise error
  File "server/blueprint.py", line 53, in wrapper
    result = await (
  File "concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "line_profiler/line_profiler.py", line 150, in wrapper
    result = func(*args, **kwds)
  File "server/tasks/event_records.py", line 18, in recurring_event_record_processing
    EventStreamService.process_records()
  File "server/services/aws/event_stream.py", line 60, in process_records
    for event_record in event_records:
  File "django/db/models/query.py", line 400, in __iter__
    self._fetch_all()
  File "django/db/models/query.py", line 1928, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
  File "django/db/models/sql/compiler.py", line 1572, in execute_sql
    cursor = self.connection.cursor()
  File "django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "django/db/backends/base/base.py", line 320, in cursor
    return self._cursor()
  File "django/db/backends/base/base.py", line 297, in _cursor
    with self.wrap_database_errors:
  File "django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "django/db/backends/base/base.py", line 298, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "django/db/backends/postgresql/base.py", line 433, in create_cursor
    cursor = self.connection.cursor()
  File "psycopg/connection.py", line 213, in cursor
    self._check_connection_ok()
  File "psycopg/_connection_base.py", line 524, in _check_connection_ok
    raise e.OperationalError("the connection is closed")