Seems _get_domains_from_body_emails doesn't check uri_to_domain result to skip obviously invalid tlds/domains.
Also it's always bad idea to use unlimited ([foo]+)+ like matching:
\b([\w\d_-+.]+\@(?:[\w\d-]+.)+[\w\d-]{2,10})\b/g
Easy improvement would be limiting username, hostname component length and counts. Also could check valid tlds directly in the regex (see SpamAssassin FreeMail.pm / HashBL.pm).
Seems _get_domains_from_body_emails doesn't check uri_to_domain result to skip obviously invalid tlds/domains.
Also it's always bad idea to use unlimited ([foo]+)+ like matching: \b([\w\d_-+.]+\@(?:[\w\d-]+.)+[\w\d-]{2,10})\b/g Easy improvement would be limiting username, hostname component length and counts. Also could check valid tlds directly in the regex (see SpamAssassin FreeMail.pm / HashBL.pm).