Internal Server Error - "Can't reconnect until invalid transaction is rolled back"

raegedoc commented 2 years ago

Please note that this is only for bug report.

Prerequisites

[X] I have searched open and closed issues to make sure that the bug has not yet been reported.

Bug report

Describe the bug Self-hosted setup. If I let the SL-app logged in and running standstill, later on the next forced refresh (F5), I get an "Internal System Error" because of an error stating "Can't reconnect until invalid transaction is rolled back"

I see in the Docker log for SimpleLogin sl-app:

  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1202, in _execute_context
      conn = self._revalidate_connection()
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 469, in _revalidate_connection
      raise exc.InvalidRequestError(
  sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back
  The above exception was the direct cause of the following exception:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 134, in handle
      self.handle_request(listener, req, client, addr)
    File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
      respiter = self.wsgi(environ, resp.start_response)
    File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2464, in __call__
      return self.wsgi_app(environ, start_response)
    File "/usr/local/lib/python3.10/site-packages/werkzeug/middleware/proxy_fix.py", line 169, in __call__
      return self.app(environ, start_response)
    File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2450, in wsgi_app
      response = self.handle_exception(e)
    File "/usr/local/lib/python3.10/site-packages/flask_cors/extension.py", line 165, in wrapped_function
      return cors_after_request(app.make_response(f(*args, **kwargs)))
    File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1879, in handle_exception
      server_error = handler(server_error)
    File "/code/server.py", line 386, in error_handler
      return render_template("error/500.html"), 500
    File "/usr/local/lib/python3.10/site-packages/flask/templating.py", line 136, in render_template
      ctx.app.update_template_context(context)
    File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 838, in update_template_context
      context.update(func())
    File "/usr/local/lib/python3.10/site-packages/flask_login/utils.py", line 379, in _user_context_processor
      return dict(current_user=_get_user())
    File "/usr/local/lib/python3.10/site-packages/flask_login/utils.py", line 346, in _get_user
      current_app.login_manager._load_user()
    File "/usr/local/lib/python3.10/site-packages/flask_login/login_manager.py", line 318, in _load_user
      user = self._user_callback(user_id)
    File "/code/server.py", line 220, in load_user
      user = User.get_by(alternative_id=alternative_id)
    File "/code/app/models.py", line 82, in get_by
      return Session.query(cls).filter_by(**kw).first()
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3429, in first
      ret = list(self[0:1])
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
      return list(res)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
      return self._execute_and_instances(context)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
      result = conn.execute(querycontext.statement, self._params)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
      return meth(self, multiparams, params)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
      return connection._execute_clauseelement(self, multiparams, params)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
      ret = self._execute_context(
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1206, in _execute_context
      self._handle_dbapi_exception(
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
      util.raise_(
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
      raise exception
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1202, in _execute_context
      conn = self._revalidate_connection()
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 469, in _revalidate_connection
      raise exc.InvalidRequestError(
  sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) ### Can't reconnect until invalid transaction is rolled back
  [SQL: SELECT users.directory_quota AS users_directory_quota, users.subdomain_quota AS users_subdomain_quota, users.password AS users_password, users.id AS users_id, users.created_at AS users_created_at, users.updated_at AS users_updated_at, users.email AS users_email, users.name AS users_name, users.is_admin AS users_is_admin, users.alias_generator AS users_alias_generator, users.notification AS users_notification, users.activated AS users_activated, users.disabled AS users_disabled, users.profile_picture_id AS users_profile_picture_id, users.otp_secret AS users_otp_secret, users.enable_otp AS users_enable_otp, users.last_otp AS users_last_otp, users.fido_uuid AS users_fido_uuid, users.default_alias_custom_domain_id AS users_default_alias_custom_domain_id, users.default_alias_public_domain_id AS users_default_alias_public_domain_id, users.lifetime AS users_lifetime, users.paid_lifetime AS users_paid_lifetime, users.lifetime_coupon_id AS users_lifetime_coupon_id, users.trial_end AS users_trial_end, users.default_mailbox_id AS users_default_mailbox_id, users.sender_format AS users_sender_format, users.sender_format_updated_at AS users_sender_format_updated_at, users.replace_reverse_alias AS users_replace_reverse_alias, users.referral_id AS users_referral_id, users.intro_shown AS users_intro_shown, users.max_spam_score AS users_max_spam_score, users.newsletter_alias_id AS users_newsletter_alias_id, users.include_sender_in_reverse_alias AS users_include_sender_in_reverse_alias, users.random_alias_suffix AS users_random_alias_suffix, users.expand_alias_info AS users_expand_alias_info, users.ignore_loop_email AS users_ignore_loop_email, users.alternative_id AS users_alternative_id, users.disable_automatic_alias_note AS users_disable_automatic_alias_note, users.one_click_unsubscribe_block_sender AS users_one_click_unsubscribe_block_sender, users.include_website_in_one_click_alias AS users_include_website_in_one_click_alias, users.disable_import AS users_disable_import, users.can_use_phone AS users_can_use_phone, users.phone_quota AS users_phone_quota, users.block_behaviour AS users_block_behaviour, users.include_header_email_header AS users_include_header_email_header, users.flags AS users_flags, users.unsub_behaviour AS users_unsub_behaviour 
  FROM users 
  WHERE users.alternative_id = %(alternative_id_1)s 
   LIMIT %(param_1)s]
  [parameters: [immutabledict({})]]

For PostgreSQL's log, nothing but:

2022-10-18 01:58:28.482 GMT [97] LOG:  could not receive data from client: Connection reset by peer
2022-10-18 02:13:13.218 GMT [95] LOG:  could not receive data from client: Connection reset by peer

After another refresh (F5), some more errors are still logged into SimpleLogin sl-app :

  2022-10-18 02:58:08,654 - SL - DEBUG - 8 - "/code/server.py:278" - after_request() -  - 10.0.0.4 GET /dashboard/ ImmutableMultiDict([]) 200, takes 0.07948565483093262
  2022-10-18 02:58:08,796 - SL - ERROR - 7 - "/code/server.py:382" - error_handler() -  - (psycopg2.OperationalError) server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
  [SQL: SELECT api_key.id AS api_key_id, api_key.created_at AS api_key_created_at, api_key.updated_at AS api_key_updated_at, api_key.user_id AS api_key_user_id, api_key.code AS api_key_code, api_key.name AS api_key_name, api_key.last_used AS api_key_last_used, api_key.times AS api_key_times, api_key.sudo_mode_at AS api_key_sudo_mode_at 
  FROM api_key 
  WHERE api_key.code IS NULL 
   LIMIT %(param_1)s]
  [parameters: {'param_1': 1}]
  (Background on this error at: http://sqlalche.me/e/13/e3q8)
  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
      self.dialect.do_execute(
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
      cursor.execute(statement, parameters)
  psycopg2.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
  The above exception was the direct cause of the following exception:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1950, in full_dispatch_request
      rv = self.dispatch_request()
    File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1936, in dispatch_request
      return self.view_functions[rule.endpoint](**req.view_args)
    File "/code/app/api/base.py", line 49, in decorated
      error_return = authorize_request()
    File "/code/app/api/base.py", line 18, in authorize_request
      api_key = ApiKey.get_by(code=api_code)
    File "/code/app/models.py", line 82, in get_by
      return Session.query(cls).filter_by(**kw).first()
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3429, in first
      ret = list(self[0:1])
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
      return list(res)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
      return self._execute_and_instances(context)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
      result = conn.execute(querycontext.statement, self._params)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
      return meth(self, multiparams, params)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
      return connection._execute_clauseelement(self, multiparams, params)
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
      ret = self._execute_context(
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
      self._handle_dbapi_exception(
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
      util.raise_(
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
      raise exception
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
      self.dialect.do_execute(
    File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
      cursor.execute(statement, parameters)
  sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) ### server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
  [SQL: SELECT api_key.id AS api_key_id, api_key.created_at AS api_key_created_at, api_key.updated_at AS api_key_updated_at, api_key.user_id AS api_key_user_id, api_key.code AS api_key_code, api_key.name AS api_key_name, api_key.last_used AS api_key_last_used, api_key.times AS api_key_times, api_key.sudo_mode_at AS api_key_sudo_mode_at 
  FROM api_key 
  WHERE api_key.code IS NULL 
   LIMIT %(param_1)s]
  [parameters: {'param_1': 1}]
  (Background on this error at: http://sqlalche.me/e/13/e3q8)
  2022-10-18 02:58:08,800 - SL - DEBUG - 7 - "/code/server.py:278" - after_request() -  - 10.0.0.4 GET /api/notifications ImmutableMultiDict([('page', '0')]) 500, takes 0.006587982177734375
  2022-10-18 02:58:14,644 - SL - DEBUG - 7 - "/code/server.py:278" - after_request() -  - 10.0.0.4 GET /dashboard/ ImmutableMultiDict([]) 200, takes 0.07764196395874023
  2022-10-18 02:58:14,791 - SL - DEBUG - 7 - "/code/server.py:278" - after_request() -  - 10.0.0.4 GET /api/notifications ImmutableMultiDict([('page', '0')]) 200, takes 0.008523941040039062
  2022-10-18 02:58:21,478 - SL - DEBUG - 8 - "/code/server.py:278" - after_request() -  - 10.0.0.4 GET /dashboard/ ImmutableMultiDict([]) 200, takes 0.07257390022277832
  2022-10-18 02:58:21,593 - SL - DEBUG - 8 - "/code/server.py:278" - after_request() -  - 10.0.0.4 GET /api/notifications ImmutableMultiDict([('page', '0')]) 200, takes 0.00901174545288086

Expected behavior Should still be running without error or dropping "connection"

Environment (If applicable):

SL App host OS: Linux Ubuntu 20.04 LTS
Browser host OS: Windows 10 (latest)
Browser: Edge, Firebox
PostgreSQL 12.x, 13.8, 14.5 with same result
SL App v4.6.2-beta, v4.14.15 and v4.14.22 with same result
Postfix 3.6.4

Additional context I can do more testing if required

nguyenkims commented 2 years ago

It seems to be an error due to the connection to the database from SL container. I'd suggest to connect to one of the SL containers (via docker exec), then test whether you can connect to the DB from there. You can also re-run the setup, I suspect the error comes from the docker network step.

raegedoc commented 2 years ago

Hi Son, Thank you for your reply. I must say my setup is running fine. I can have access to the GUI, send and receive email thru my aliases, create contacts. The only problem I have is that sometimes, there is an message stating "Internal Server Error" and some logs in sl-app stating "Can't reconnect until invalid transaction is rolled back" around the same SQL QUERY you can see in my logs.

2022-10-25 03:51:31,969 - SL - ERROR - 6 - "/code/server.py:385" - error_handler() - - 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Even if sl-app is stuck with "Internal System Error", I can still connect to the database, directly of thru another docker container. Because of this, I can confirm my docker network is working.

Sometimes, the only way to bypass the error message is to restart SimpleLogin.

Here's the proof of my working GUI with sent/reveived count:

sl-app - Issue 1355 - Proof of working installation

Is there anything special around the query : SELECT users.directory_quota AS users_directory_quota, ... ?

nguyenkims commented 2 years ago

@Raegedoc oh I see, if this is an intermittent error, I suspect it's because the database connection is "expired". It can be the case if the website is only used from to time. In this case, I think you don't need to restart the server, just refresh the website and a new web worker should be spawn and create a new connection to the database.

raegedoc commented 2 years ago

Hi Son, it's effectively intermittent. Thank you for confirming my setup is indeed functional :)

raegedoc commented 2 years ago

Hi Son, To prevent such errors, wouldn't be possible to have a recurring task just pinging the database within sl-app to keep the connection open ?

To validate the testcase, I added an extension to my Edge browser to refresh the page every 4 minutes, it works but sometimes, the same query still fails. I had no interaction with sl-app GUI and my database is still available to other components/apps I have in my environment.

Would it be possible to investigate on your side around that code ?

Thank you

simplelogin_webapp_logs.txt

nguyenkims commented 2 years ago

@Raegedoc we have pushed a new image 4.6.3-beta with the new option SQLALCHEMY_POOL_PRE_PING which should help with the lost connection issue. Can you set this option in your config file and try with the new image?

raegedoc commented 2 years ago

@nguyenkims thank you for providing a test release. It's running as of now and I will provide further feedbacks tomorrow. It's running in the same environment as before so no change on that side.

I also added this to file simplelogin.env:

# SQLalchemy fix
SQLALCHEMY_POOL_PRE_PING=1

To test if any sqlalchemy exception would occur, I do a complete refresh of the sl-app webgui every 3 minutes using a Microsoft Edge extension named "Auto Refresh Plus".

Thanks!

raegedoc commented 2 years ago

@nguyenkims, great news! The code change you implemented does fix the recurring problem except for one small detail.

In 23 hours I only had one (1) episode of an error in the log even thru if the sl-app GUI never indicated anything like "Internal Server Error". I sometime changed to a different tab (Aliases, Mailboxes, Domains, etc) for testing different queries.

simplelogin 463beta wepapp_log.txt

What is the codebase of 4.6.3-beta ? How it differs from simplelogin/app-ci:v4.16.0 and the current code of GitHub ?

Thank you!

nguyenkims commented 2 years ago

@Raegedoc glad that the error seems to go away. With the new version, if the app has an expired connection (that ends up with the error you see in the log), it will automatically try to get a new connection.

What is the codebase of 4.6.3-beta ? How it differs from simplelogin/app-ci:v4.16.0 and the current code of GitHub ?

The code is on https://github.com/simple-login/app/tree/4.6.2-beta and it's going to be the next version we release.

raegedoc commented 2 years ago

@nguyenkims, happy to help

I commented your fix for a small code change: https://github.com/simple-login/app/commit/0fd00d8dd8cd2ed43447d4cbdf1ba1b868b0dd8f

Thanks!

Queuecumber commented 2 years ago

I'm seeing the same error periodically, I think this fix isn't on master though, is this a temporary fix for 4.6.2 with a more permanent fix planned or is there a plan to eventually land this on master?

nguyenkims commented 2 years ago

@Queuecumber the fix will be merged back to master indeed.

Queuecumber commented 2 years ago

I cherry picked 0fd00d8 onto my own master and I still get the following error periodically

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 134, in handle
    self.handle_request(listener, req, client, addr)
  File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2464, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.10/site-packages/werkzeug/middleware/proxy_fix.py", line 169, in __call__
    return self.app(environ, start_response)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2450, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1879, in handle_exception
    server_error = handler(server_error)
  File "/code/server.py", line 393, in error_handler
    return render_template("error/500.html"), 500
  File "/usr/local/lib/python3.10/site-packages/flask/templating.py", line 136, in render_template
    ctx.app.update_template_context(context)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 838, in update_template_context
    context.update(func())
  File "/usr/local/lib/python3.10/site-packages/flask_login/utils.py", line 379, in _user_context_processor
    return dict(current_user=_get_user())
  File "/usr/local/lib/python3.10/site-packages/flask_login/utils.py", line 346, in _get_user
    current_app.login_manager._load_user()
  File "/usr/local/lib/python3.10/site-packages/flask_login/login_manager.py", line 318, in _load_user
    user = self._user_callback(user_id)
  File "/code/server.py", line 227, in load_user
    user = User.get_by(alternative_id=alternative_id)
  File "/code/app/models.py", line 82, in get_by
    return Session.query(cls).filter_by(**kw).first()
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3429, in first
    ret = list(self[0:1])
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
    return list(res)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1206, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1202, in _execute_context
    conn = self._revalidate_connection()
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 469, in _revalidate_connection
    raise exc.InvalidRequestError(
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
[SQL: SELECT users.directory_quota AS users_directory_quota, users.subdomain_quota AS users_subdomain_quota, users.password AS users_password, users.id AS users_id, users.created_at AS users_created_at, users.updated_at AS users_updated_at, users.email AS users_email, users.name AS users_name, users.is_admin AS users_is_admin, users.alias_generator AS users_alias_generator, users.notification AS users_notification, users.activated AS users_activated, users.disabled AS users_disabled, users.profile_picture_id AS users_profile_picture_id, users.otp_secret AS users_otp_secret, users.enable_otp AS users_enable_otp, users.last_otp AS users_last_otp, users.fido_uuid AS users_fido_uuid, users.default_alias_custom_domain_id AS users_default_alias_custom_domain_id, users.default_alias_public_domain_id AS users_default_alias_public_domain_id, users.lifetime AS users_lifetime, users.paid_lifetime AS users_paid_lifetime, users.lifetime_coupon_id AS users_lifetime_coupon_id, users.trial_end AS users_trial_end, users.default_mailbox_id AS users_default_mailbox_id, users.sender_format AS users_sender_format, users.sender_format_updated_at AS users_sender_format_updated_at, users.replace_reverse_alias AS users_replace_reverse_alias, users.referral_id AS users_referral_id, users.intro_shown AS users_intro_shown, users.max_spam_score AS users_max_spam_score, users.newsletter_alias_id AS users_newsletter_alias_id, users.include_sender_in_reverse_alias AS users_include_sender_in_reverse_alias, users.random_alias_suffix AS users_random_alias_suffix, users.expand_alias_info AS users_expand_alias_info, users.ignore_loop_email AS users_ignore_loop_email, users.alternative_id AS users_alternative_id, users.disable_automatic_alias_note AS users_disable_automatic_alias_note, users.one_click_unsubscribe_block_sender AS users_one_click_unsubscribe_block_sender, users.include_website_in_one_click_alias AS users_include_website_in_one_click_alias, users.disable_import AS users_disable_import, users.can_use_phone AS users_can_use_phone, users.phone_quota AS users_phone_quota, users.block_behaviour AS users_block_behaviour, users.include_header_email_header AS users_include_header_email_header, users.flags AS users_flags, users.unsub_behaviour AS users_unsub_behaviour 
FROM users 
WHERE users.alternative_id = %(alternative_id_1)s 
 LIMIT %(param_1)s]
[parameters: [immutabledict({})]]

I actually truncated this because the entire error is quite long, is there something else I need to do to apply the fix?

I made sure that SQLALCHEMY_POOL_PRE_PING is set and I confirmed that app.config["SQLALCHEMY_ENGINE_OPTIONS"] = {"pool_pre_ping": True} is being executed using print statements.

raegedoc commented 2 years ago

@Queuecumber, even if pre_ping is enabled, I still get the error sometimes.

I asked myself... What if the error is not SimpleLogin but something else ?

Last week-end I builded a new database server outside of Docker. My new build is based on Ubuntu 20.04 with PostgreSQL 14.6 on a single node (not HA). To my surprise, I didn't get any error since...

Just updated to version v4.18.2 and disabled "pre_ping". Pursuing my investigation.

Queuecumber commented 2 years ago

I do recall seeing something weird in my postgres logs, I'll look it up later today, unfortunately I'm on kubernetes so not-docker isn't an option for me

Queuecumber commented 2 years ago

2022-11-21 03:32:42.572 UTC [1] LOG:  database system is ready to accept connections
2022-11-21 05:33:23.096 UTC [38] LOG:  could not receive data from client: Connection reset by peer
2022-11-21 05:33:23.096 UTC [40] LOG:  could not receive data from client: Connection reset by peer
2022-11-21 05:33:23.096 UTC [39] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 00:39:23.672 UTC [8409] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 00:39:23.672 UTC [8410] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 00:44:51.352 UTC [8428] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 01:59:16.878 UTC [37] LOG:  unexpected EOF on client connection with an open transaction
2022-11-27 03:59:16.760 UTC [8626] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 04:05:00.824 UTC [8628] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 04:05:00.824 UTC [8627] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 12:45:13.442 UTC [8624] LOG:  unexpected EOF on client connection with an open transaction
2022-11-27 14:45:54.072 UTC [9279] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 14:45:54.072 UTC [9277] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 14:45:54.072 UTC [9278] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 15:54:42.843 UTC [9345] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 15:54:42.843 UTC [9346] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 16:00:10.520 UTC [9363] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 16:08:06.495 UTC [9276] LOG:  unexpected EOF on client connection with an open transaction
2022-11-27 16:24:38.935 UTC [9377] LOG:  could not receive data from client: Connection timed out
2022-11-27 18:09:03.768 UTC [9496] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 18:09:03.768 UTC [9495] LOG:  could not receive data from client: Connection reset by peer
2022-11-27 18:10:50.263 UTC [9494] LOG:  could not receive data from client: Connection reset by peer

This is the kind of stuff I see in my postgres log, it does seem to me like it's a client issue but hard to tell

Queuecumber commented 2 years ago

I'm trying to do more debugging on this and I set app.config["SQLALCHEMY_ECHO"] = "debug" and app.config["SQLALCHEMY_ENGINE_OPTIONS"] = {"pool_pre_ping": True, "echo_pool": "debug"} in create_app which should result in extremely verbose output, but I'm not seeing any of the sqlalchemy stuff logged. Is there someplace else that this output is being overridden?

Queuecumber commented 2 years ago

@nguyenkims Are you sure that the code in server.py create_app even does anything? After looking into this further I don't think simplelogin uses Flask-SQLAlchemy and it definitely doesn't call init_app on a SQLAlchemy object. I was able to get debug output by editing https://github.com/simple-login/app/blob/4.6.2-beta/app/db.py#L8 and adding echo_pool=debug there. I left off the pool_pre_ping option to get a baseline test but if this is where the argument should have gone it definitely explains why the previous fix didnt work

Queuecumber commented 2 years ago

With that debug output this seems to what's causing the error to happen

2022-12-01 15:00:01,219 INFO sqlalchemy.pool.impl.QueuePool Invalidate connection <connection object at 0x7f91f8fdd800; dsn: 'user=myuser password=xxx dbname=simplelogin host=simplelogin-db port=5432 application_name=webapp', closed: 2> (reason: OperationalError:server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
)
2022-12-01 15:00:01,219 DEBUG sqlalchemy.pool.impl.QueuePool Closing connection <connection object at 0x7f91f8fdd800; dsn: 'user=myuser password=xxx dbname=simplelogin host=simplelogin-db port=5432 application_name=webapp', closed: 2>

Will now try w/ pool_pre_ping

Queuecumber commented 2 years ago

Ok I am getting closer to figuring this out. Here's a summary of my current findings

simplelogin doesnt use flask-sqlalchemy, the flags set in server.py create_app do nothing and should be removed
Adding pool_pre_ping to the correct place in db.py does not fix the problem. That's because pre-ping only happens during a session checkout and the failure we are observing is a dropped connection on initiation of a query (so after checkout)
For @Raegedoc his issue is resolved by running the postgres server outside docker (please post if that information is out of date) for myself, I moved the postgres server from an external kubernetes pod that I was using to simplify external queries into the same kubernetes pod as the simplelogin containers. I then modified them to use localhost as their database server. I have not had the crash since doing this.

So I have to conclude that this is some incompatibility in the way the docker network is functioning. However, I still don't have an explanation of exactly what is going wrong and why much less how to fix it short of the workarounds described in point (3). This is particularly odd since the official self-host instructions use docker and we are not seeing mass-reports of this error.

I will try to keep investigating this

raegedoc commented 1 year ago

Hi! It has been 8 days and my setup is running perfectly without any error when PostgreSQL is external to Docker.

I'll make PostgreSQL HA in the coming weeks and comment back on the matter.

Thank you @Queuecumber for your efforts :)

nguyenkims commented 1 year ago

@Queuecumber we actually also run Postgres outside of Docker in our SaaS version. It is generally not recommended to run database inside Docker for a heavy load (it's fine for a personal usage).

MuhammadNaeem42 commented 1 year ago

i will help you to setup this application contact me https://www.fiverr.com/s/9W3PNA simple

simple-login / app

Internal Server Error - "Can't reconnect until invalid transaction is rolled back" #1355

Prerequisites

Bug report