nextcloud / all-in-one

📦 The official Nextcloud installation method. Provides easy deployment and maintenance with most features included in this one Nextcloud instance.
https://hub.docker.com/r/nextcloud/all-in-one
GNU Affero General Public License v3.0
5.56k stars 643 forks source link

Postgres database running out of available connections #2833

Closed web3d0 closed 1 year ago

web3d0 commented 1 year ago

Steps to reproduce

  1. Select NextCloud on Digital Ocean as a starting droplet.
  2. Update NextCloud to the latest, and get it running.
  3. Wait for a day or two.

Expected behavior

NextCloud should continue running.

Actual behavior

The log file fills up with:

2023-06-21 16:03:50.654 ACST [4938] FATAL: remaining connection slots are reserved for non-replication superuser connections 2023-06-21 16:04:20.812 ACST [4948] FATAL: remaining connection slots are reserved for non-replication superuser connections

and Postgres shows a large number of idle connections.

Host OS

Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-75-generic x86_64)

Nextcloud AIO version

Nextcloud AIO v6.1.1

Current channel

latest

Other valuable info

Could be a mismatch between number of connections to keep open when pooling and the number of connections allowed by the database?

szaimen commented 1 year ago

Hi, which Nextcloud apps do you have installed?

web3d0 commented 1 year ago

It comes with a default set of "Apache" "Database" "Nextcloud" "Redis" "Collabora" and "Nextcloud Talk". I have disabled the "Talk" option, so "Collabora" is still running. None of the other optional packages of "Talk" "ClamAV" "Fulltextsearch" "Imaginary" or "Talk recording server" are running.

I'm comfortable on the command line, so I'm happy to probe into any avenues you suggest. If you can say how to get a psql connection to the Postgres database container, I can try and find where the connections are coming from?

szaimen commented 1 year ago

Can you post your Nextcloud logs here? You can get them via https://yourdomain.com/settings/admin/logging

web3d0 commented 1 year ago

Here is a sample of one of them, as the log file is very large:

[core] Error: Doctrine\DBAL\Exception: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[08006] [7] connection to server at "nextcloud-aio-database" (172.18.0.3), port 5432 failed: FATAL: remaining connection slots are reserved for non-replication superuser connections at <>

  1. /var/www/html/3rdparty/doctrine/dbal/src/Connection.php line 1531 OC\DB\Connection->connect()
  2. /var/www/html/3rdparty/doctrine/dbal/src/Connection.php line 1029 Doctrine\DBAL\Connection->getWrappedConnection()
  3. /var/www/html/lib/private/DB/Connection.php line 262 Doctrine\DBAL\Connection->executeQuery("SELECT * FROM \"oc_appconfig\"", [], [], null)
  4. /var/www/html/3rdparty/doctrine/dbal/src/Query/QueryBuilder.php line 345 OC\DB\Connection->executeQuery("SELECT * FROM \"oc_appconfig\"", [], [])
  5. /var/www/html/lib/private/DB/QueryBuilder/QueryBuilder.php line 280 Doctrine\DBAL\Query\QueryBuilder->execute()
  6. /var/www/html/lib/private/AppConfig.php line 418 OC\DB\QueryBuilder\QueryBuilder->execute()
  7. /var/www/html/lib/private/AppConfig.php line 184 OC\AppConfig->loadConfigValues()
  8. /var/www/html/lib/private/AppConfig.php line 374 OC\AppConfig->getApps()
  9. /var/www/html/lib/private/legacy/OC_App.php line 976 OC\AppConfig->getValues(false, "installed_version")
  10. /var/www/html/lib/private/Server.php line 729 OC_App::getAppVersions()
    1. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 171 OC\Server->OC{closure}(" sensitive parameters replaced ")
    2. /var/www/html/3rdparty/pimple/pimple/src/Pimple/Container.php line 122 OC\AppFramework\Utility\SimpleContainer->OC\AppFramework\Utility{closure}(" sensitive parameters replaced ")
    3. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 138 Pimple\Container->offsetGet("OC\Memcache\Factory")
    4. /var/www/html/lib/private/ServerContainer.php line 171 OC\AppFramework\Utility\SimpleContainer->query("OC\Memcache\Factory", true)
    5. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 65 OC\ServerContainer->query("OC\Memcache\Factory")
    6. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 193 OC\AppFramework\Utility\SimpleContainer->get("OC\Memcache\Factory")
    7. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 171 OC\AppFramework\Utility\SimpleContainer->OC\AppFramework\Utility{closure}(" sensitive parameters replaced ")
    8. /var/www/html/3rdparty/pimple/pimple/src/Pimple/Container.php line 118 OC\AppFramework\Utility\SimpleContainer->OC\AppFramework\Utility{closure}(" sensitive parameters replaced ")
    9. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 138 Pimple\Container->offsetGet("OCP\ICacheFactory")
    10. /var/www/html/lib/private/ServerContainer.php line 171 OC\AppFramework\Utility\SimpleContainer->query("OCP\ICacheFactory", true)
    11. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 65 OC\ServerContainer->query("OCP\ICacheFactory")
    12. /var/www/html/lib/private/Server.php line 1110 OC\AppFramework\Utility\SimpleContainer->get("OCP\ICacheFactory")
    13. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 171 OC\Server->OC{closure}(" sensitive parameters replaced ")
    14. /var/www/html/3rdparty/pimple/pimple/src/Pimple/Container.php line 122 OC\AppFramework\Utility\SimpleContainer->OC\AppFramework\Utility{closure}(" sensitive parameters replaced ")
    15. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 138 Pimple\Container->offsetGet("OCP\Lock\ILockingProvider")
    16. /var/www/html/lib/private/ServerContainer.php line 171 OC\AppFramework\Utility\SimpleContainer->query("OCP\Lock\ILockingProvider", true)
    17. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 65 OC\ServerContainer->query("OCP\Lock\ILockingProvider")
    18. /var/www/html/lib/private/Server.php line 2065 OC\AppFramework\Utility\SimpleContainer->get("OCP\Lock\ILockingProvider")
    19. /var/www/html/lib/private/Files/View.php line 119 OC\Server->getLockingProvider()
    20. /var/www/html/lib/private/Server.php line 464 OC\Files\View->__construct()
    21. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 171 OC\Server->OC{closure}(" sensitive parameters replaced ")
    22. /var/www/html/3rdparty/pimple/pimple/src/Pimple/Container.php line 122 OC\AppFramework\Utility\SimpleContainer->OC\AppFramework\Utility{closure}(" sensitive parameters replaced ")
    23. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 138 Pimple\Container->offsetGet("OC\Files\Node\HookConnector")
    24. /var/www/html/lib/private/ServerContainer.php line 171 OC\AppFramework\Utility\SimpleContainer->query("OC\Files\Node\HookConnector", true)
    25. /var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php line 65 OC\ServerContainer->query("OC\Files\Node\HookConnector")
    26. /var/www/html/lib/private/Server.php line 1467 OC\AppFramework\Utility\SimpleContainer->get("OC\Files\Node\HookConnector")
    27. /var/www/html/lib/base.php line 625 OC\Server->boot()
    28. /var/www/html/lib/base.php line 1167 OC::init()
    29. /var/www/html/index.php line 34 require_once("/var/www/html/lib/base.php")

POST /apps/text/session/sync from 1.124.107.142 at 2023-06-21T04:40:30+00:00

However I'm sure this is just a connection error after the problem has happened. I'm looking for the first error to see what was immediately prior to that. EDIT: No, it goes straight from a reboot to being out of database connections a few hours later.

web3d0 commented 1 year ago

So from my poking at it so far, I'd like to know what's holding the database connections open. There's about 30 connections just sitting idle, so I could modify the postgres configuration to kill connections after 5 minutes... but why are those connections there in the first place? Are they pooled and waiting for use, in which case the problem is a mis-matching number? Or is something failing to run to completion, so killing the connections would hide the underlying problem?

szaimen commented 1 year ago

I suppose this is happening due to one of the installed apps. Can you run sudo docker exec --user www-data -it nextcloud-aio-nextcloud php occ app:list and post the output here?

web3d0 commented 1 year ago

root@nextcloud:~# docker exec --user www-data -it nextcloud-aio-nextcloud php occ app:list Enabled:

This should be a clean install of Nextcloud, as it was a Digital Ocean droplet which was installed and upgraded. I did try and sync a directory which had node_modules in it, so it thrashed for a bit on the thousands of small files and crashed. I cleared the directory and kept going, but maybe this is a hangover issue?

szaimen commented 1 year ago

What kind of hardware do you have? How many CPUs and how many RAM?

web3d0 commented 1 year ago

1 core, 2GB RAM, 50GB disk. 33% disk usage, 4% CPU usage, 50% memory usage. 50k/sec disk usage (4 people who barely use it), and 4GB swapspace with 0% used.

szaimen commented 1 year ago

1 core, 2GB RAM

I would suggest to upgrade to 2vCPU and 4GB RAM, then you should be safe. Otherwise you should probably disable all additional addons as suggested in the AIO interface.