mailwatch / MailWatch

MailWatch for MailScanner is a web-based front-end to MailScanner
http://mailwatch.org/
GNU General Public License v2.0
115 stars 66 forks source link

mailwatch_milter_relay: very long execution time and high sql load #1265

Open dneuhaeuser opened 1 year ago

dneuhaeuser commented 1 year ago

When running 'mailwatch_milter_relay.php' I observed that a run with '--refresh' (e.g. via cron) takes longer and longer, rising while mail.log is growing, from several minutes ultimatively to hours. In this time there are very high loads on the mysql process.

When looking at the script I ask myself whether the code from line 143 onward really needs to be executed for each and every line in the logfile (each execution of function 'process_entries'). https://github.com/mailwatch/MailWatch/blob/ca1d81b4f2e5f788d70a84ae2daabc79cc330220/tools/Postfix_relay/mailwatch_milter_relay.php#L143

I believe it should probably be sufficient to run the codelines 143-163 only ONCE AFTER all logfile lines are processed. All necessary informations for this are in the $idqueue array, right?

shawniverson commented 1 year ago

Hi @dneuhaeuser That would make for a good optimization, I believe. This process has never been super efficient. I am open to ideas.

dneuhaeuser commented 1 year ago

I'm currently working on improving this scripts performance...

On an installation with about ~500 emails in the mail.log (filesize ~2,5 MB) running mailwatch_milter_relay currently takes 36 minutes here with 100% cpu load on mysqld.

shawniverson commented 1 year ago

Just bear in mind under normal conditions --refresh should only be run once, preferrably after a logrotate even. I look forward to your improvements.

dneuhaeuser commented 1 year ago

please look at the PR #1266

it is a very simple change but effective: the execution which took 36+ minutes before now completes in about 10 seconds.

probably there is still room for more optimization...

the cron file ('tools/Postfix_relay/mailwatch-milter-relay') executes the script hourly with --refresh

EDIT: there is actually a suggestion for the 'tail' mode in 'INSTALL.milter' https://github.com/mailwatch/MailWatch/blob/ca1d81b4f2e5f788d70a84ae2daabc79cc330220/tools/Postfix_relay/INSTALL.milter#L33

I wonder why this method is not used in the suggested cron-file https://github.com/mailwatch/MailWatch/blob/ca1d81b4f2e5f788d70a84ae2daabc79cc330220/tools/Postfix_relay/mailwatch-milter-relay#L12

shawniverson commented 1 year ago

@dneuhaeuser you are right, we need to fix that discrepancy. I actually use a systemd unit for this. It may make even more sense for us to include both a cron with the tail method and a systemd unit file that also uses the tail method.