Closed scottyhq closed 3 years ago
A starting point is the rolconnlimit
in the pg_roles
table
https://www.postgresql.org/docs/current/view-pg-roles.html
Check what value is currently set in there.
Not sure how the JupyterHub on k8s affects things here. Note that each node has a unique internal IP (hostname -I | awk '{print $1}'
--> 172.AA.AAA.AA) and each pod has a unique internal IP (172.BB.BBB.BB), but everyone seems to have the same external IP (curl https://ipinfo.io/ip
--> 44.CCC.CC.CCC) regardless of where their server is running.
I think what @jomey has pointed out here is what is going one. Something I didn't think of.
Well I checked what the user had for max connections it was unlimited.
As in the name of this issue @scottyhq made, the postgres config has a max_connection of.... 100 people. I just changed it to 500
I think the connections != people.
It's a notebook that establishes a connection, so if you have 70 working on 4 parallel notebooks, then you will have 280 connections unless they shut down and restart their kernel
I am not certain on this one and I doubt it exists, but maybe there is a 'idle_timeout' at which point SQLAlchemy shuts down the connection and re-connects if it is used again
Yep per our convo, I just set the tcp_keepalives_idle
to 900 seconds (15 minutes).
I am closing this issue in favor of continuing with the logged SnowExSql issue
It seems like we can currently only have a limited number of simultaneous connections (not sure exactly how many or where this configuration lives). cc @micahjohnson150 @jomey @lsetiawan if you want to dig in.
the EC2 config is here https://github.com/snowex-hackweek/jupyterhub/blob/main/terraform/eks/ec2_postgres.tf and the actual database setup is probably documented over in https://snowexsql.readthedocs.io/en/latest/
Full Traceback
```pytb --------------------------------------------------------------------------- OperationalError Traceback (most recent call last) /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/base.py in _wrap_pool_connect(self, fn, connection) 2335 try: -> 2336 return fn() 2337 except dialect.dbapi.Error as e: /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in connect(self) 363 if not self._use_threadlocal: --> 364 return _ConnectionFairy._checkout(self) 365 /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in _checkout(cls, pool, threadconns, fairy) 777 if not fairy: --> 778 fairy = _ConnectionRecord.checkout(pool) 779 /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in checkout(cls, pool) 494 def checkout(cls, pool): --> 495 rec = pool._do_get() 496 try: /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/impl.py in _do_get(self) 139 with util.safe_reraise(): --> 140 self._dec_overflow() 141 else: /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py in __exit__(self, type_, value, traceback) 67 if not self.warn_only: ---> 68 compat.raise_( 69 exc_value, /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***) 181 try: --> 182 raise exception 183 finally: /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/impl.py in _do_get(self) 136 try: --> 137 return self._create_connection() 138 except: /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in _create_connection(self) 308 --> 309 return _ConnectionRecord(self) 310 /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __init__(self, pool, connect) 439 if connect: --> 440 self.__connect(first_connect_check=True) 441 self.finalize_callback = deque() /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check) 660 with util.safe_reraise(): --> 661 pool.logger.debug("Error on connect(): %s", e) 662 else: /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py in __exit__(self, type_, value, traceback) 67 if not self.warn_only: ---> 68 compat.raise_( 69 exc_value, /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***) 181 try: --> 182 raise exception 183 finally: /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check) 655 self.starttime = time.time() --> 656 connection = pool._invoke_creator(self) 657 pool.logger.debug("Created new connection %r", connection) /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py in connect(connection_record) 113 return connection --> 114 return dialect.connect(*cargs, **cparams) 115 /srv/conda/envs/notebook/lib/python3.8/site-packages/sqlalchemy/engine/default.py in connect(self, *cargs, **cparams) 507 # inherits the docstring from interfaces.Dialect.connect --> 508 return self.dbapi.connect(*cargs, **cparams) 509 /srv/conda/envs/notebook/lib/python3.8/site-packages/psycopg2/__init__.py in connect(dsn, connection_factory, cursor_factory, **kwargs) 121 dsn = _ext.make_dsn(dsn, **kwargs) --> 122 conn = _connect(dsn, connection_factory=connection_factory, **kwasync) 123 if cursor_factory is not None: OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser connections The above exception was the direct cause of the following exception: OperationalError Traceback (most recent call last)