spamhaus / spamassassin-dqs

Spamhaus code for the Spamassassin plugin. See https://docs.spamhaustech.com/40-real-world-usage/SpamAssassin/000-intro.html
Apache License 2.0
54 stars 16 forks source link

Improve email parser #4

Closed hege-li closed 5 years ago

hege-li commented 5 years ago

Seems _get_domains_from_body_emails doesn't check uri_to_domain result to skip obviously invalid tlds/domains.

Also it's always bad idea to use unlimited ([foo]+)+ like matching: \b([\w\d_-+.]+\@(?:[\w\d-]+.)+[\w\d-]{2,10})\b/g Easy improvement would be limiting username, hostname component length and counts. Also could check valid tlds directly in the regex (see SpamAssassin FreeMail.pm / HashBL.pm).

ricalfieri commented 5 years ago

Thanks for the suggestion, I replaced my code with some taken from HashBL