smtpd / qpsmtpd

qpsmtpd is a flexible smtpd daemon written in Perl
http://smtpd.github.io/qpsmtpd/
MIT License
138 stars 75 forks source link

Lack of per-plugin timeout make transactions time out (w/patch) #294

Open yitzhaq opened 4 years ago

yitzhaq commented 4 years ago

It seems this is an old and well-known issue (even acknowledged by @abh back in the day as a "pretty serious bug"), but I couldn't find a report for it in here, so it seems worth raising. Plenty of threads to read through by Googling qpsmtpd timeout.

AFAICT there doesn't seem to be any mechanism by which to define a timeout per plugin, and only a general timeout (which from observation seems to be a good 600s) appears to apply. This can have particularly nasty consequences when something third-party, called by a plugin, experiences locking issues or similar. Typical examples would be SpamAssassin, any virus scanner, DSPAM etc.

The unfortunate common manifestation of this seems to be that the other end will drop the connection while waiting up to ten minutes for any response, thinking delivery has failed, even though it will eventually succeed once the timeout is reached. Thus delivery gets retried, leading to a duplicate message as far as content, but which usually will be sufficiently different as to not get caught by most duplicate detection. If the issue repeats itself, this will lead to another duplicate message, rinse and repeat. On top of this, these stalled connections can cause qpsmtpd to run out of available connections, causing further delivery issues. Not a pretty sight, and worst case this can bring a MTA to its knees when third-party software experiences issues.

Back in 2008 @vetinari posted a proposed plugin to configure per-plugin timeouts, which I believe would be the way to go here, and a superior approach to adding timeout functionality to each and every plugin. A few core changes are however necessary to support this mechanism (also in the patch), so it's not quite just a drop-in fix. His proposal received no comments, so I don't know to what extent any of this has been sanity checked.

It would be wonderful if someone (which if we're being realistic at this point probably means @msimerson) would be willing to review Hanno's code, apply whatever polish feels necessary, and if it works well, merge it.

msimerson commented 3 years ago

Having seen this same issue in both Qpsmtpd and Haraka, I don't think that that per-plugin timeouts are the correct answer. A more robust solution is for the MTA to verify that the remote is still connected immediately before calling the queue plugin(s). If there's nobody at the other end, we won't be able to inform them of queue success or failure and thus should discard the message.