m2ms / fragalysis-frontend

The React, Redux frontend built by webpack
Other
1 stars 1 forks source link

Numerous pymysql connection errors #1407

Open alanbchristie opened 5 months ago

alanbchristie commented 5 months ago

Discovered while investigating #1403.

A large number of ISPyB connection failures seem to be caused by OperationalError exceptions: -

2024-04-04T10:29:12+0000 api.remote_ispyb_connector.remote_connect():137 INFO # Started SSH server
2024-04-04T10:29:12+0000 api.remote_ispyb_connector.remote_connect():139 INFO # Connecting to ISPyB (db_user=ispyb_api_fragalysis db_name=ispyb)...
2024-04-04T10:29:13+0000 api.remote_ispyb_connector.remote_connect():159 INFO # OperationalError(2013): OperationalError(2013, 'Lost connection to MySQL server during query')
2024-04-04T10:29:14+0000 api.remote_ispyb_connector.remote_connect():159 INFO # OperationalError(2013): OperationalError(2013, 'Lost connection to MySQL server during query')
2024-04-04T10:29:15+0000 api.remote_ispyb_connector.remote_connect():159 INFO # OperationalError(2013): OperationalError(2013, 'Lost connection to MySQL server during query')
2024-04-04T10:29:16+0000 api.remote_ispyb_connector.remote_connect():159 INFO # OperationalError(2013): OperationalError(2013, 'Lost connection to MySQL server during query (timed out)')
2024-04-04T10:29:16+0000 api.remote_ispyb_connector.remote_connect():175 INFO # Connected

The majority of intermittent connection failures seem to be "handled" by simply retrying a number of times, along with additional timeout specifications (adding a read timeout for example)

alanbchristie commented 5 months ago

After analysis a number of adjustments have been made to the code: -

With these in place an experimental deployment appears to "mask" 14 out of 16 connection problems (approximately 87% reduction in false alarms).

alanbchristie commented 5 months ago

The improved ssh-tunnel/mysql handler is present in the latest backend and 2024.04.1. It now logs MySQL connection errors - the source of the majority of our "connection issues".