reanahub / reana-server

REANA API server
http://reana-server.readthedocs.io/
MIT License
5 stars 37 forks source link

rest: too many open connections to database #571

Open mdonadoni opened 1 year ago

mdonadoni commented 1 year ago

Many connections to the DB are opened by r-server when handling requests. These connections are not closed, probably because of connection pooling.

Given that there is a limit on the number of concurrent connections, it can happen that under heavy load some components are not able to reach the DB:

==> ERROR: Cannot create workflow :
(psycopg2.OperationalError) connection to server at "..." (...), port ... failed: FATAL:  remaining connection slots are reserved for non-replication superuser connections

How to reproduce:

  1. Deploy REANA locally in production mode
  2. Execute many workflows and send many requests to REANA, like so:

    $ for i in `seq 30`; do reana-client list &; done; wait
    $ for i in `seq 30`; do reana-client ls -w workflow &; done; wait
  3. Look at open connections (in this case 10.244.0.178 is r-server, while 10.244.0.5 is r-w-controller)

    reana=# select client_addr, count(*) from pg_stat_activity group by client_addr;
     client_addr  | count 
    --------------+-------
                  |     6
     10.244.0.178 |    46
     10.244.0.5   |     5
    (3 rows)
mdonadoni commented 1 year ago

This issue is (at least partially) caused by Invenio and REANA using different connection pools to connect to the DB.

This can be seen easily by setting the following reana-server's environment variables in values.yaml:

@@ -78,6 +78,8 @@ components:
       REANA_RATELIMIT_GUEST_USER: "20 per second"
       REANA_RATELIMIT_AUTHENTICATED_USER: "20 per second"
       REANA_RATELIMIT_SLOW: "1/5 second"
+      SQLALCHEMY_POOL_SIZE: 1
+      SQLALCHEMY_MAX_OVERFLOW: 0
     uwsgi:
       processes: 6
       threads: 4

In this case, the pool size used by REANA and Invenio is different, which means that different connection pools are being used:

$ kubectl -it exec deployment/reana-server -c rest-api -- invenio shell
In [1]: from invenio_db import db

In [2]: db.engine.pool.size()
Out[2]: 5

In [3]: from reana_db import database

In [4]: database.engine.pool.size()
Out[4]: 1

Note that Invenio uses Flask-SQLAlchemy extension to connect to the DB, so it is possible to modify Invenio's pool size by setting SQLALCHEMY_POOL_SIZE = 1 in reana_server/config.py.

At the moment I don't see which are the benefits of using different connection pools. Instead, there a few drawbacks to this approach: not only it increases the number of connections open with the DB, but it is also possible that multiple connections to the DB are opened to serve a single request. This also makes it impossible to modify REANA and Invenio tables in the same transaction.

mdonadoni commented 1 year ago

This is the number of connections depending on the number of uWSGI processes and threads.

With REANA's and Invenio's pool size equal to 1: Processes Threads Connections
3 4 7 (3 REANA + 3 Invenio + 1 scheduler)
3 8 7 (3 + 3 + 1)
6 4 13 (6 + 6 + 1)
6 8 13 (6 + 6 + 1)
With REANA's pool size set to 1 and Invenio's pool size equal to 5: Processes Threads Connections
3 4 16 (3 REANA + 3*4 Invenio + 1 scheduler)
3 8 19 (3 + 3*5 + 1)
6 4 31 (6 + 6*4 + 1)
6 8 37 (6 + 6*5 + 1)

Connections are shared between threads but not between processes. For this reason, it is not very useful to have a pool size bigger than the number of threads per process, as each thread will open one connection and none of them will be shared (e.g. with pool size equal to 5 and considering 4 threads, only 4 connections will be opened).

mdonadoni commented 1 week ago

Note that REANA now supports pgBouncer, so this issue has become less urgent