I'm running this in ECS on a tiny t2.micro instance. I have a mariadb:10.3.2 container running on the same instance for the database.
I've started seeing a lot of 500s...
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT budgets.id AS budgets_id, budgets.is_periodic AS budgets_is_periodic, budgets.name AS budgets_name, budgets.description AS budgets_description, budgets.starting_balance AS budgets_starting_balance, budgets.current_balance AS budgets_current_balance, budgets.is_active AS budgets_is_active, budgets.is_income AS budgets_is_income, budgets.omit_from_graphs AS budgets_omit_from_graphs \nFROM budgets'] (Background on this error at: http://sqlalche.me/e/e3q8)
However, the mariadb logs are showing messages like:
[Warning] Aborted connection 175 to db: 'XXX' user: 'XXX' host: 'XXX' (Got timeout reading communication packets)
MariaDB is incrementing Aborted_clients but not Aborted_connects and is not incrementing any of the Connection_errors_* variables. MariaDB has its timeouts set to the defaults - 5s for connect_timeout and 28800 for wait_timeout and interactive_timeout.
The SQLAlchemy error message points us to the Pooling docs on handling disconnects.
As far as I can tell, the problem is definitely because of stale connections; it seems to only happen after periods of inactivity in the application. If the app hasn't had any requests for some amount of time (let's say a few minutes, to a few hours) this seems to happen. However, if I repeatedly browse through different views in rapid succession, there's a definite point where the 500s stop happening... which seems to indicate stale connections in the pool.
From the docs, it seems like this should be fixed by adding pool_pre_ping=True to our create_engine() call. If desired, this could be controlled by an environment variable.
I'm running this in ECS on a tiny t2.micro instance. I have a
mariadb:10.3.2
container running on the same instance for the database.I've started seeing a lot of 500s...
However, the mariadb logs are showing messages like:
MariaDB is incrementing Aborted_clients but not Aborted_connects and is not incrementing any of the
Connection_errors_*
variables. MariaDB has its timeouts set to the defaults - 5s forconnect_timeout
and 28800 for wait_timeout and interactive_timeout.MySQL docs on communication errors are here .
The SQLAlchemy error message points us to the Pooling docs on handling disconnects.
As far as I can tell, the problem is definitely because of stale connections; it seems to only happen after periods of inactivity in the application. If the app hasn't had any requests for some amount of time (let's say a few minutes, to a few hours) this seems to happen. However, if I repeatedly browse through different views in rapid succession, there's a definite point where the 500s stop happening... which seems to indicate stale connections in the pool.
From the docs, it seems like this should be fixed by adding pool_pre_ping=True to our
create_engine()
call. If desired, this could be controlled by an environment variable.