A few log files are failing to ingest due to a connection error:
2023-08-18 20:23:38,823 - INFO: Ingesting /var/log/lmod.log-20230702
2023-08-18 20:23:38,823 - INFO: Fetching database connection details
2023-08-18 20:23:38,851 - INFO: Parsing log data
2023-08-18 20:23:41,625 - INFO: Ingesting data into database table lmod_tracking.log_data
2023-08-19 04:36:54,377 - ERROR: (mysql.connector.errors.OperationalError) 2013 (HY000): Lost connection to MySQL server during query
[SQL: INSERT INTO log_data (logname, time, host, user, module, path, `package`, version) VALUES (...) (Background on this error at: https://sqlalche.me/e/20/e3q8)
2023-08-19 04:36:56,520 - INFO: Ingesting /var/log/lmod.log-20230703
2023-08-19 04:36:56,520 - INFO: Fetching database connection details
2023-08-19 04:36:56,558 - INFO: Parsing log data
2023-08-19 04:37:02,153 - INFO: Ingesting data into database table lmod_tracking.log_data
2023-08-20 16:06:08,114 - ERROR: (mysql.connector.errors.OperationalError) 2013 (HY000): Lost connection to MySQL server during query
[SQL: INSERT INTO log_data (logname, time, host, user, module, path, `package`, version) VALUES (...) (Background on this error at: https://sqlalche.me/e/20/e3q8)
The link provided by the error message has this to say:
OperationalError
Exception raised for errors that are related to the database’s operation and not necessarily under the control of the programmer, e.g. an unexpected disconnect occurs, the data source name is not found, a transaction could not be processed, a memory allocation error occurred during processing, etc.
This error is a DBAPI Error and originates from the database driver (DBAPI), not SQLAlchemy itself.
The OperationalError is the most common (but not the only) error class used by drivers in the context of the database connection being dropped, or not being able to connect to the database. For tips on how to deal with this, see the section Dealing with Disconnects.
Further documentation talks about pre ingestion health checks and refreshing stale connection pools - neither of which should be relevant here.
The files being ingested are the two largest log files and error after several hours.
At first I thought this might be a timeout error, but the first file error after 6 hours and the second after 15.5. I expect a simple timeout error to be more consistent.
The next largest log files are 11M which ingest without issue:
root root 11M Jun 20 03:45 lmod.log-20230620
root root 11M Jun 21 03:31 lmod.log-20230621
In general, these ingestion times are problematically long. Speeding up the ingestion (not sure how) might alleviate the symptoms, but I'd like to understand the core underlying problem.
A few log files are failing to ingest due to a connection error:
The link provided by the error message has this to say:
Further documentation talks about pre ingestion health checks and refreshing stale connection pools - neither of which should be relevant here.
The files being ingested are the two largest log files and error after several hours.
At first I thought this might be a timeout error, but the first file error after 6 hours and the second after 15.5. I expect a simple timeout error to be more consistent.
The next largest log files are 11M which ingest without issue:
In general, these ingestion times are problematically long. Speeding up the ingestion (not sure how) might alleviate the symptoms, but I'd like to understand the core underlying problem.