newslynx / opportunities

A garden of NewsLynx futures
1 stars 0 forks source link

error, too many clients open #102

Closed mhkeller closed 9 years ago

mhkeller commented 9 years ago

when updating core

TASK: [update newslynx-core] **************************************************
<54.68.94.39> ESTABLISH CONNECTION FOR USER: ubuntu
<54.68.94.39> REMOTE_MODULE command chdir=/opt/newslynx/newslynx-core/ make app_install
<54.68.94.39> EXEC ssh -C -tt -vvv -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o Port=22 -o IdentityFile="/home/ubuntu/.ssh/Collector-MIP.pem" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=30 54.68.94.39 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1441079964.52-106798733634740 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1441079964.52-106798733634740 && echo $HOME/.ansible/tmp/ansible-tmp-1441079964.52-106798733634740'
<54.68.94.39> PUT /tmp/tmpzykxKt TO /home/ubuntu/.ansible/tmp/ansible-tmp-1441079964.52-106798733634740/command
<54.68.94.39> EXEC ssh -C -tt -vvv -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o Port=22 -o IdentityFile="/home/ubuntu/.ssh/Collector-MIP.pem" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=30 54.68.94.39 /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=hssrsxphflrgacymbdrsdnrfqppkcuqo] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-hssrsxphflrgacymbdrsdnrfqppkcuqo; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1441079964.52-106798733634740/command; rm -rf /home/ubuntu/.ansible/tmp/ansible-tmp-1441079964.52-106798733634740/ >/dev/null 2>&1'"'"''
failed: [newslynx] => {"changed": true, "cmd": ["make", "app_install"], "delta": "0:00:06.641510", "end": "2015-09-01 03:59:55.061859", "rc": 2, "start": "2015-09-01 03:59:48.420349", "warnings": []}
stderr: Traceback (most recent call last):
  File "/usr/local/bin/newslynx", line 9, in <module>
    load_entry_point('newslynx==1.1.5', 'console_scripts', 'newslynx')()
  File "/usr/local/lib/python2.7/dist-packages/newslynx/cli/__init__.py", line 129, in run
    from newslynx.cli import init
  File "/usr/local/lib/python2.7/dist-packages/newslynx/cli/init.py", line 11, in <module>
    from newslynx.init import load_sql
  File "/usr/local/lib/python2.7/dist-packages/newslynx/init.py", line 9, in <module>
    from newslynx.core import settings
  File "/usr/local/lib/python2.7/dist-packages/newslynx/core.py", line 80, in <module>
    raise ConfigError(format_exc())
newslynx.exc.ConfigError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/newslynx/core.py", line 77, in <module>
    db_session.execute('SET TIMEZONE TO UTC')
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/scoping.py", line 150, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1023, in execute
    bind, close_with_result=True).execute(clause, params or {})
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 887, in _connection_for_bind
    engine, execution_options)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 334, in _connection_for_bind
    conn = bind.contextual_connect()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/threadlocal.py", line 64, in contextual_connect
    self.pool.connect, connection),
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect
    e, dialect, self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1405, in _handle_dbapi_exception_noconnection
    exc_info
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 376, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 713, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 480, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1060, in _do_get
    self._dec_overflow()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1057, in _do_get
    return self._create_connection()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 323, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 449, in __init__
    self.connection = self.__connect()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 607, in __connect
    connection = self.__pool._invoke_creator(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 97, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 385, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/local/lib/python2.7/dist-packages/psycopg2/__init__.py", line 164, in connect
    conn = _connect(dsn, connection_factory=connection_factory, async=async)
OperationalError: (psycopg2.OperationalError) FATAL:  sorry, too many clients already
FATAL:  sorry, too many clients already
mhkeller commented 9 years ago

putting this on hold for now. low priority.

mhkeller commented 9 years ago

reopening because it is coming up in sous chefs running as well as running make update

if you run top there are about four dozen postgresql processes still hanging around. there must be somewhere that isn't closing these connections.

adding this to make update helps but is not a real solution

sudo service postgresql restart
abelsonlive commented 9 years ago

Will leave open until further testing, but I'm fairly certain this commit will fix the issue. In the traceback you can see the error is occurring here:

newslynx.exc.ConfigError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/newslynx/core.py", line 77, in <module>
    db_session.execute('SET TIMEZONE TO UTC')

I had been using db_session to make requests to the database outside of the context of the Flask app, since this is a limitation of Flask-SQLAlchemy. The problem is that this was being called every time a sous chef was loaded, since I check whether they have valid python paths and are installed by loading the modules. Since each sous chef uses underlying elements of newslynx-core, every time a sous chef was loaded, so was newslynx-core, and thus this session as well. The problem is that it was never being closes. Now I'll just generate these sessions using a function, and load them were I need them.

abelsonlive commented 9 years ago

has this come up again @mhkeller ?

mhkeller commented 9 years ago

This has been okay

Le Sep 12, 2015 à 01:39, Brian Abelson notifications@github.com a écrit :

has this come up again @mhkeller ?

— Reply to this email directly or view it on GitHub.

mhkeller commented 9 years ago

Just got it loading article detail view

"Traceback (most recent call last):  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)  File "/usr/local/lib/python2.7/dist-packages/newslynx/views/decorators.py", line 35, in decorated_function
return f(*args, **kw)  File "/usr/local/lib/python2.7/dist-packages/newslynx/views/decorators.py", line 75, in decorated_function
return f(*args, **kw)  File "/usr/local/lib/python2.7/dist-packages/newslynx/views/api/events_api.py", line 294, in search_events
for by, result in event_facet_pool.imap_unordered(fx, kw['facets']):  File "/usr/local/lib/python2.7/dist-packages/gevent/pool.py", line 222, in next
raise value.excOperationalError: (psycopg2.OperationalError) FATAL:  sorry, too many clients alreadyFATAL:  sorry, too many clients already"
mhkeller commented 9 years ago

This error is still coming up frequently.

mhkeller commented 9 years ago

currently limit in postgresql conf is 100 fyi

mhkeller commented 9 years ago

Using the automation setup, to temporarily increase limit:

vagrant ssh
sudo -u postgres psql
SHOW config_file;

That will print the location of the postgres conf file

sudo vi <file>

Change the limit to 500. Logout with logout and then make update to restart.

abelsonlive commented 9 years ago

This is going to require a more serious audit. Would be good to be able to sit down with you bc i don't have access to prod server. Hard to know exactly where the sessions are not getting closed without seeing the step-by-step process you're taking which generates them.

abelsonlive commented 9 years ago

This issue seems relevant:

http://stackoverflow.com/questions/25768428/sqlalchemy-connection-errors

I'll need to go through my sql code and check if I'm sending multiple commands in a single call to db.session.execute. I could also try not using gevent with gunicorn.

abelsonlive commented 9 years ago

A handy postgresql command for checking idle postgres processes:

select * from pg_stat_activity where state = 'idle in transaction';
abelsonlive commented 9 years ago

I should have cleaned up all of the session leaks. I'm also going to add a simple cronjob which kills idle postgres transactions:

$ ps aux | grep postgres | grep 'idle in transaction' | awk '{print $2}' | xargs kill -9
abelsonlive commented 9 years ago

added script to automation which kills idle transactions older than 15 minutes every 10 minutes. https://github.com/newslynx/automation/commit/19ecbf09d801891fb46197f005e5ebbb2eea0a28