shatteredsilicon / ssm-submodules

GNU Affero General Public License v3.0
1 stars 2 forks source link

Slow log rotation bug #342

Open gordan-bobic opened 2 weeks ago

gordan-bobic commented 2 weeks ago

Under high load, log rotation goes wrong:

Nov 14 01:59:00 dev ssm-mysql-queries[2424367]: 2024/11/14 01:59:00.003148 WARNING qan-interval stat /var/lib/mysql/slow.log: no such file or directory

And that in turn breaks query harvesting because the base slow.log no longer exists (but rotated files do. Oddly, flush logs in mysql shell doesn't seem to get the slow.log re-created.

This seems to happen under extremely high load, slow logs get significantly oversized, they should be rotated after each GB:

-rw-rw----. 1 mysql mysql 4.7G Nov 14 01:50 /var/lib/mysql/slow.log-1731541839
-rw-rw----. 1 mysql mysql 5.0G Nov 14 02:10 /var/lib/mysql/slow.log-1731543030

For testing, I generated the load using:

sysbench oltp_read_write --table-size=1000000 --threads=32 --time=86400 --db-driver=mysql --mysql-db=test run

This seems to happen with MariaDB 10.5.27. With 10.6.20, it seems to produce a lot of this:

[root@dev /var/log]# tail -f messages | grep qan
Nov 14 02:30:54 dev ssm-mysql-queries[2970254]: 2024/11/14 02:30:54.004858 WARNING qan-analyzer-mysql-9f9521f8-worker Timeout parsing /var/lib/mysql/slow.log 152433581-915701237: 72.2% 660966394/915701237 763267656 54.0s
Nov 14 02:32:00 dev ssm-mysql-queries[2970254]: 2024/11/14 02:32:00.007361 WARNING qan-analyzer-mysql-9f9521f8 Skipping interval '6 /var/lib/mysql/slow.log 2024-11-14 00:31:00 UTC to 2024-11-14 00:32:00 UTC (0-124364369)' because interval '5 /var/lib/mysql/slow.log 2024-11-14 00:30:00 UTC to 2024-11-14 00:31:00 UTC (915701237-1678061887)' is still being parsed
Nov 14 02:33:00 dev ssm-mysql-queries[2970254]: 2024/11/14 02:33:00.004998 WARNING qan-analyzer-mysql-9f9521f8 Skipping interval '7 /var/lib/mysql/slow.log 2024-11-14 00:32:00 UTC to 2024-11-14 00:33:00 UTC (124364369-891628602)' because interval '5 /var/lib/mysql/slow.log 2024-11-14 00:30:00 UTC to 2024-11-14 00:31:00 UTC (915701237-1678061887)' is still being parsed
Nov 14 02:34:00 dev ssm-mysql-queries[2970254]: 2024/11/14 02:34:00.003638 WARNING qan-analyzer-mysql-9f9521f8 Skipping interval '8 /var/lib/mysql/slow.log 2024-11-14 00:33:00 UTC to 2024-11-14 00:34:00 UTC (891628602-1662080490)' because interval '5 /var/lib/mysql/slow.log 2024-11-14 00:30:00 UTC to 2024-11-14 00:31:00 UTC (915701237-1678061887)' is still being parsed
Nov 14 02:35:00 dev ssm-mysql-queries[2970254]: 2024/11/14 02:35:00.007736 WARNING qan-analyzer-mysql-9f9521f8 Skipping interval '9 /var/lib/mysql/slow.log 2024-11-14 00:34:00 UTC to 2024-11-14 00:35:00 UTC (1662080490-2419821122)' because interval '5 /var/lib/mysql/slow.log 2024-11-14 00:30:00 UTC to 2024-11-14 00:31:00 UTC (915701237-1678061887)' is still being parsed

but it looks like when it times out, it doesn't upload any data. It looks like once it times out, it never uploads anything after that. Like a status check gets stuck, but the harvesting aborts. That "interval 5" stays there forever.

Tested with ssm-client 9.4.1-11

Note: Special handling of slow log rotation is required until this bug is fixed: https://jira.mariadb.org/browse/MDEV-35444

gordan-bobic commented 3 days ago

Still seems to be happening in ssm-client-9.4.1-13 from 26/11.