spamhaus / spamassassin-dqs

Spamhaus code for the Spamassassin plugin. See https://docs.spamhaustech.com/40-real-world-usage/SpamAssassin/000-intro.html
Apache License 2.0
54 stars 16 forks source link

Make gethostbyname lookups asynchronous #14

Closed robertmathews closed 4 years ago

robertmathews commented 4 years ago

Since switching to the Spamhaus DQS plugin, I've noticed that it takes SpamAssassin far longer to scan some messages containing many URLs in the body. For example, the message at https://gist.github.com/robertmathews/47223b49aab854ad5a7d046f139c77c8 takes several minutes to scan:


time spamassassin -t < spamassassin-pathological ... real 6m26.634s user 0m4.635s sys 0m0.198s

I traced this problem to the use of direct synchronous "gethostbyname" calls in the SH.pm code. It can hang for up to 30 seconds on each domain name with nonworking nameservers.

To fix this, I replaced the gethostbyname calls with asynchronous lookups and callbacks, the same way the included SpamAssassin "URIDNSBL.pm" does. Now the same message scans in the 4 seconds I'd expect, and gives identical results in terms of scoring:


time spamassassin -t < spamassassin-pathological ... real 0m3.996s user 0m2.732s sys 0m0.116s

ricalfieri commented 4 years ago

Thanks!

I'm going to test these changes and eventually merge them if everything looks ok