Open raegedoc opened 2 years ago
It seems to be an error due to the connection to the database from SL container. I'd suggest to connect to one of the SL containers (via docker exec
), then test whether you can connect to the DB from there. You can also re-run the setup, I suspect the error comes from the docker network step.
Hi Son, Thank you for your reply. I must say my setup is running fine. I can have access to the GUI, send and receive email thru my aliases, create contacts. The only problem I have is that sometimes, there is an message stating "Internal Server Error" and some logs in sl-app stating "Can't reconnect until invalid transaction is rolled back" around the same SQL QUERY you can see in my logs.
2022-10-25 03:51:31,969 - SL - ERROR - 6 - "/code/server.py:385" - error_handler() - - 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Even if sl-app is stuck with "Internal System Error", I can still connect to the database, directly of thru another docker container. Because of this, I can confirm my docker network is working.
Sometimes, the only way to bypass the error message is to restart SimpleLogin.
Here's the proof of my working GUI with sent/reveived count:
Is there anything special around the query : SELECT users.directory_quota AS users_directory_quota, ... ?
@Raegedoc oh I see, if this is an intermittent error, I suspect it's because the database connection is "expired". It can be the case if the website is only used from to time. In this case, I think you don't need to restart the server, just refresh the website and a new web worker should be spawn and create a new connection to the database.
Hi Son, it's effectively intermittent. Thank you for confirming my setup is indeed functional :)
Hi Son, To prevent such errors, wouldn't be possible to have a recurring task just pinging the database within sl-app to keep the connection open ?
To validate the testcase, I added an extension to my Edge browser to refresh the page every 4 minutes, it works but sometimes, the same query still fails. I had no interaction with sl-app GUI and my database is still available to other components/apps I have in my environment.
Would it be possible to investigate on your side around that code ?
Thank you
@Raegedoc we have pushed a new image 4.6.3-beta with the new option SQLALCHEMY_POOL_PRE_PING
which should help with the lost connection issue. Can you set this option in your config file and try with the new image?
@nguyenkims thank you for providing a test release. It's running as of now and I will provide further feedbacks tomorrow. It's running in the same environment as before so no change on that side.
I also added this to file simplelogin.env:
# SQLalchemy fix
SQLALCHEMY_POOL_PRE_PING=1
To test if any sqlalchemy exception would occur, I do a complete refresh of the sl-app webgui every 3 minutes using a Microsoft Edge extension named "Auto Refresh Plus".
Thanks!
@nguyenkims, great news! The code change you implemented does fix the recurring problem except for one small detail.
In 23 hours I only had one (1) episode of an error in the log even thru if the sl-app GUI never indicated anything like "Internal Server Error". I sometime changed to a different tab (Aliases, Mailboxes, Domains, etc) for testing different queries.
simplelogin 463beta wepapp_log.txt
What is the codebase of 4.6.3-beta ? How it differs from simplelogin/app-ci:v4.16.0 and the current code of GitHub ?
Thank you!
@Raegedoc glad that the error seems to go away. With the new version, if the app has an expired connection (that ends up with the error you see in the log), it will automatically try to get a new connection.
What is the codebase of 4.6.3-beta ? How it differs from simplelogin/app-ci:v4.16.0 and the current code of GitHub ?
The code is on https://github.com/simple-login/app/tree/4.6.2-beta and it's going to be the next version we release.
@nguyenkims, happy to help
I commented your fix for a small code change: https://github.com/simple-login/app/commit/0fd00d8dd8cd2ed43447d4cbdf1ba1b868b0dd8f
Thanks!
I'm seeing the same error periodically, I think this fix isn't on master though, is this a temporary fix for 4.6.2 with a more permanent fix planned or is there a plan to eventually land this on master?
@Queuecumber the fix will be merged back to master indeed.
I cherry picked 0fd00d8 onto my own master and I still get the following error periodically
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2464, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.10/site-packages/werkzeug/middleware/proxy_fix.py", line 169, in __call__
return self.app(environ, start_response)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2450, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.10/site-packages/flask_cors/extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1879, in handle_exception
server_error = handler(server_error)
File "/code/server.py", line 393, in error_handler
return render_template("error/500.html"), 500
File "/usr/local/lib/python3.10/site-packages/flask/templating.py", line 136, in render_template
ctx.app.update_template_context(context)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 838, in update_template_context
context.update(func())
File "/usr/local/lib/python3.10/site-packages/flask_login/utils.py", line 379, in _user_context_processor
return dict(current_user=_get_user())
File "/usr/local/lib/python3.10/site-packages/flask_login/utils.py", line 346, in _get_user
current_app.login_manager._load_user()
File "/usr/local/lib/python3.10/site-packages/flask_login/login_manager.py", line 318, in _load_user
user = self._user_callback(user_id)
File "/code/server.py", line 227, in load_user
user = User.get_by(alternative_id=alternative_id)
File "/code/app/models.py", line 82, in get_by
return Session.query(cls).filter_by(**kw).first()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3429, in first
ret = list(self[0:1])
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
return list(res)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1206, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1202, in _execute_context
conn = self._revalidate_connection()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 469, in _revalidate_connection
raise exc.InvalidRequestError(
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
[SQL: SELECT users.directory_quota AS users_directory_quota, users.subdomain_quota AS users_subdomain_quota, users.password AS users_password, users.id AS users_id, users.created_at AS users_created_at, users.updated_at AS users_updated_at, users.email AS users_email, users.name AS users_name, users.is_admin AS users_is_admin, users.alias_generator AS users_alias_generator, users.notification AS users_notification, users.activated AS users_activated, users.disabled AS users_disabled, users.profile_picture_id AS users_profile_picture_id, users.otp_secret AS users_otp_secret, users.enable_otp AS users_enable_otp, users.last_otp AS users_last_otp, users.fido_uuid AS users_fido_uuid, users.default_alias_custom_domain_id AS users_default_alias_custom_domain_id, users.default_alias_public_domain_id AS users_default_alias_public_domain_id, users.lifetime AS users_lifetime, users.paid_lifetime AS users_paid_lifetime, users.lifetime_coupon_id AS users_lifetime_coupon_id, users.trial_end AS users_trial_end, users.default_mailbox_id AS users_default_mailbox_id, users.sender_format AS users_sender_format, users.sender_format_updated_at AS users_sender_format_updated_at, users.replace_reverse_alias AS users_replace_reverse_alias, users.referral_id AS users_referral_id, users.intro_shown AS users_intro_shown, users.max_spam_score AS users_max_spam_score, users.newsletter_alias_id AS users_newsletter_alias_id, users.include_sender_in_reverse_alias AS users_include_sender_in_reverse_alias, users.random_alias_suffix AS users_random_alias_suffix, users.expand_alias_info AS users_expand_alias_info, users.ignore_loop_email AS users_ignore_loop_email, users.alternative_id AS users_alternative_id, users.disable_automatic_alias_note AS users_disable_automatic_alias_note, users.one_click_unsubscribe_block_sender AS users_one_click_unsubscribe_block_sender, users.include_website_in_one_click_alias AS users_include_website_in_one_click_alias, users.disable_import AS users_disable_import, users.can_use_phone AS users_can_use_phone, users.phone_quota AS users_phone_quota, users.block_behaviour AS users_block_behaviour, users.include_header_email_header AS users_include_header_email_header, users.flags AS users_flags, users.unsub_behaviour AS users_unsub_behaviour
FROM users
WHERE users.alternative_id = %(alternative_id_1)s
LIMIT %(param_1)s]
[parameters: [immutabledict({})]]
I actually truncated this because the entire error is quite long, is there something else I need to do to apply the fix?
I made sure that SQLALCHEMY_POOL_PRE_PING
is set and I confirmed that app.config["SQLALCHEMY_ENGINE_OPTIONS"] = {"pool_pre_ping": True}
is being executed using print statements.
@Queuecumber, even if pre_ping is enabled, I still get the error sometimes.
I asked myself... What if the error is not SimpleLogin but something else ?
Last week-end I builded a new database server outside of Docker. My new build is based on Ubuntu 20.04 with PostgreSQL 14.6 on a single node (not HA). To my surprise, I didn't get any error since...
Just updated to version v4.18.2 and disabled "pre_ping". Pursuing my investigation.
I do recall seeing something weird in my postgres logs, I'll look it up later today, unfortunately I'm on kubernetes so not-docker isn't an option for me
2022-11-21 03:32:42.572 UTC [1] LOG: database system is ready to accept connections
2022-11-21 05:33:23.096 UTC [38] LOG: could not receive data from client: Connection reset by peer
2022-11-21 05:33:23.096 UTC [40] LOG: could not receive data from client: Connection reset by peer
2022-11-21 05:33:23.096 UTC [39] LOG: could not receive data from client: Connection reset by peer
2022-11-27 00:39:23.672 UTC [8409] LOG: could not receive data from client: Connection reset by peer
2022-11-27 00:39:23.672 UTC [8410] LOG: could not receive data from client: Connection reset by peer
2022-11-27 00:44:51.352 UTC [8428] LOG: could not receive data from client: Connection reset by peer
2022-11-27 01:59:16.878 UTC [37] LOG: unexpected EOF on client connection with an open transaction
2022-11-27 03:59:16.760 UTC [8626] LOG: could not receive data from client: Connection reset by peer
2022-11-27 04:05:00.824 UTC [8628] LOG: could not receive data from client: Connection reset by peer
2022-11-27 04:05:00.824 UTC [8627] LOG: could not receive data from client: Connection reset by peer
2022-11-27 12:45:13.442 UTC [8624] LOG: unexpected EOF on client connection with an open transaction
2022-11-27 14:45:54.072 UTC [9279] LOG: could not receive data from client: Connection reset by peer
2022-11-27 14:45:54.072 UTC [9277] LOG: could not receive data from client: Connection reset by peer
2022-11-27 14:45:54.072 UTC [9278] LOG: could not receive data from client: Connection reset by peer
2022-11-27 15:54:42.843 UTC [9345] LOG: could not receive data from client: Connection reset by peer
2022-11-27 15:54:42.843 UTC [9346] LOG: could not receive data from client: Connection reset by peer
2022-11-27 16:00:10.520 UTC [9363] LOG: could not receive data from client: Connection reset by peer
2022-11-27 16:08:06.495 UTC [9276] LOG: unexpected EOF on client connection with an open transaction
2022-11-27 16:24:38.935 UTC [9377] LOG: could not receive data from client: Connection timed out
2022-11-27 18:09:03.768 UTC [9496] LOG: could not receive data from client: Connection reset by peer
2022-11-27 18:09:03.768 UTC [9495] LOG: could not receive data from client: Connection reset by peer
2022-11-27 18:10:50.263 UTC [9494] LOG: could not receive data from client: Connection reset by peer
This is the kind of stuff I see in my postgres log, it does seem to me like it's a client issue but hard to tell
I'm trying to do more debugging on this and I set app.config["SQLALCHEMY_ECHO"] = "debug"
and app.config["SQLALCHEMY_ENGINE_OPTIONS"] = {"pool_pre_ping": True, "echo_pool": "debug"}
in create_app
which should result in extremely verbose output, but I'm not seeing any of the sqlalchemy stuff logged. Is there someplace else that this output is being overridden?
@nguyenkims Are you sure that the code in server.py create_app
even does anything? After looking into this further I don't think simplelogin uses Flask-SQLAlchemy and it definitely doesn't call init_app
on a SQLAlchemy
object. I was able to get debug output by editing https://github.com/simple-login/app/blob/4.6.2-beta/app/db.py#L8 and adding echo_pool=debug
there. I left off the pool_pre_ping
option to get a baseline test but if this is where the argument should have gone it definitely explains why the previous fix didnt work
With that debug output this seems to what's causing the error to happen
2022-12-01 15:00:01,219 INFO sqlalchemy.pool.impl.QueuePool Invalidate connection <connection object at 0x7f91f8fdd800; dsn: 'user=myuser password=xxx dbname=simplelogin host=simplelogin-db port=5432 application_name=webapp', closed: 2> (reason: OperationalError:server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
)
2022-12-01 15:00:01,219 DEBUG sqlalchemy.pool.impl.QueuePool Closing connection <connection object at 0x7f91f8fdd800; dsn: 'user=myuser password=xxx dbname=simplelogin host=simplelogin-db port=5432 application_name=webapp', closed: 2>
Will now try w/ pool_pre_ping
Ok I am getting closer to figuring this out. Here's a summary of my current findings
create_app
do nothing and should be removedpool_pre_ping
to the correct place in db.py does not fix the problem. That's because pre-ping only happens during a session checkout and the failure we are observing is a dropped connection on initiation of a query (so after checkout)localhost
as their database server. I have not had the crash since doing this. So I have to conclude that this is some incompatibility in the way the docker network is functioning. However, I still don't have an explanation of exactly what is going wrong and why much less how to fix it short of the workarounds described in point (3). This is particularly odd since the official self-host instructions use docker and we are not seeing mass-reports of this error.
I will try to keep investigating this
Hi! It has been 8 days and my setup is running perfectly without any error when PostgreSQL is external to Docker.
I'll make PostgreSQL HA in the coming weeks and comment back on the matter.
Thank you @Queuecumber for your efforts :)
@Queuecumber we actually also run Postgres outside of Docker in our SaaS version. It is generally not recommended to run database inside Docker for a heavy load (it's fine for a personal usage).
i will help you to setup this application contact me https://www.fiverr.com/s/9W3PNA
Please note that this is only for bug report.
Prerequisites
Bug report
Describe the bug Self-hosted setup. If I let the SL-app logged in and running standstill, later on the next forced refresh (F5), I get an "Internal System Error" because of an error stating "Can't reconnect until invalid transaction is rolled back"
I see in the Docker log for SimpleLogin sl-app:
For PostgreSQL's log, nothing but:
After another refresh (F5), some more errors are still logged into SimpleLogin sl-app :
Expected behavior Should still be running without error or dropping "connection"
Environment (If applicable):
Additional context I can do more testing if required