lobsters / lobsters-ansible

Ansible playbook for lobste.rs
ISC License
78 stars 25 forks source link

fail2ban 100% CPU for 30+ minutes after reboot #40

Closed alanpost closed 11 months ago

alanpost commented 5 years ago

Over the weekend I rebooted the lobste.rs DomU to add memory. After the host came back the fail2ban-server daemon processes were CPU bound for at least 30 minutes, potentiality for hours. Upon enabling debug logging I discovered it was reading auth.log. This log file has not been rotated recently and is now hundreds of megabytes in size. I infer that reading this file is the bulk of the cause of the sustained CPU usage for this daemon.

There is evidence of log rotation in the past, including a logrotate rule for auth.log along with historical rotated files. I'd like to investigate why this is no longer working and reduce the time it takes for the host to warm up.

pushcx commented 5 years ago

I suspect our logrotation needs to be totally audited: I don't see daily rotation of app logs, etc. I haven't tried to diagnose and fix because it's proven useful to me that I can grep weeks of logs (without hassling with bzcat, bzgrep) for sockpuppet checks, but the Right Way would be to adjust the rules to keep more plaintext of app/nginx logs and fix all the rules.