spamhaus / spamassassin-dqs

Spamhaus code for the Spamassassin plugin. See https://docs.spamhaustech.com/40-real-world-usage/SpamAssassin/000-intro.html
Apache License 2.0
54 stars 15 forks source link

Make sure that "example.com" is never wrongly recognized as "example.co" due to the random order of TLDs in $self->{main}->{registryboundaries}->{valid_tlds_re} #29

Closed robertmathews closed 3 years ago

robertmathews commented 3 years ago

While testing on a message that contained "address@example.com", I noticed that running it on the same message in debug mode would sometimes say:

dbg: SHPlugin: (_get_body_emails) found email address@example.com in body

And sometimes:

dbg: SHPlugin: (_get_body_emails) found email address@example.co in body

(note "example.co" instead of "example.com").

This happens because the regexp in $self->{main}->{registryboundaries}->{valid_tlds_re} has the TLDs in a random order on each run due to Perl hash randomization. Because ".co" and ".com" are both valid TLDs, the regexp sometimes looks like "(...|co|...|com|...)" and sometimes like "(...|com|...|co|...)", and the first one matches. This makes _get_body_emails not work half the time on ".com" addresses because it looks up the wrong domain name.

The fix is taken from package Mail::SpamAssassin::Plugin::FreeMail; it makes sure that there are not more "domain name like" characters after the end of the TLDs.

ricalfieri commented 3 years ago

Are you using SA version 3.4.1 or lower? Because this should happen only in those version.

The _init_email_re functions checks SA version and, only if < 3.4.2 creates the valid_tlds_re regex. In case you use a newer version the regex is taken from the RegistryBoundaries.pm file, where it is correctly termnated like your suggestion:

my $tlds = '(?<![a-zA-Z0-9-])(?:'. # make sure tld starts at boundary join('|', keys %{$self->{conf}->{valid_tlds}}). ')(?!(?:[a-zA-Z0-9-]|.[a-zA-Z0-9]))'; # make sure it ends

So I'm expecting this bug to only show on SA < 3.4.2 . Can you confirm?

robertmathews commented 3 years ago

I'm using 3.4.2 from Debian. It looks like the change to RegistryBoundaries.pm was added in 3.4.3. That was changed 2018-10-29, but 3.4.2 was released 2018-09-16. I've downloaded a fresh copy of 3.4.2 and confirmed the fix isn't in that version.

So this bug affects SpamAssassin 3.4.2 or earlier. Maybe there could be a check for if ($sa_version < 343), and use the different regexp if so?

ricalfieri commented 3 years ago

Makes sense, I'll test your changes and merge, thanks

(edited because I was wrong in my previous assumptions:) )