spamhaus / spamassassin-dqs

Spamhaus code for the Spamassassin plugin. See https://docs.spamhaustech.com/40-real-world-usage/SpamAssassin/000-intro.html
Apache License 2.0
54 stars 15 forks source link

SH_BODYURI_REVERSE_SBL triggers for fonts.googleapis.com #68

Closed xrat closed 6 months ago

xrat commented 6 months ago

In the past, SH_BODYURI_REVERSE_SBL triggered due to IP addresses for fonts.googleapis.com being listed. It's unclear why these IP addresses got listed in the first place, however, that's not the issue here. There should be means to avoid false positives due to URIs which are obviously generically used like fonts.googleapis.com, fonts.gstatic.com, schemas.microsoft.com, or www.w3.org to name a few candidates.

Note that the default threshold for spam in SpamAssassin is 5. The currently assigned score for SH_BODYURI_REVERSE_SBL by this project is 8: https://github.com/spamhaus/spamassassin-dqs/blob/666784cf94d8d17e3e2ffc09e95116aad5246af4/3.4.1%2B/sh_scores.cf#L7

BTW, one such message recently affected was

Date: Thu, 01 Feb 2024 00:10:05 +0000
From: Spamhaus Technology <notification@service.spamhaus.com>
Reply-To: datafeed-accounts@spamhaus.com
Subject: Annual Account Notification

It would seem to me like at least URIs of some known good and frequently used remote fonts hosters and schemas in general should be excluded. This would also save some resources of Spamhaus' DNS infrastructure.

ricalfieri commented 6 months ago

Hi, this is a listing issue, not a plugin's one. I escalated internally but this is not the right place to raise issues about possible FPs.

xrat commented 6 months ago

Can we at least agree that schema URIs should not be checked?

ricalfieri commented 6 months ago

No, because they could be ever changing and we do not mantain a list of them.

xrat commented 6 months ago

I don't mean a list but the extraction algorithm. The way the plugin operates it extracts and subsequently checks URIs of remote fonts (I get that though I see potential and actual problems) and schemas like those for XML etc. pointing at www.w3.org. I've never heard of them being exploited or being any reasonable indicator for spam-iness.

xrat commented 6 months ago

Hi, this is a listing issue, not a plugin's one. I escalated internally but this is not the right place to raise issues about possible FPs.

Thanks for escalating this internally. However, I now came to the conclusion that I have to disagree with closing this issue as a "listing issue". Here's why:

  1. As fellow mailop Bernhard Lichtinger today pointed out on the mailop mailing list (where aspects of this issue are discussed, too), "IPs of fonts.googleapis.com got listed on SBL because these IPs are also used to serve firebasestorage.googleapis.com." I verified this right now w/ at https://check.spamhaus.org/listed/?searchterm=216.58.212.170 . This listing is not disputed. We all agree that the IP is listed for good reasons. It's how this plugin operates and how it is set up which causes problems.
  2. Many legitimate senders have no choice about what HTML code their MSP or MUA uses. If it uses remote fonts hosted on fonts.googleapis.com, as it currently stands, their messages will very likely be flagged spam by the plugin. On a low volume server where I am currently testing the plugin the FP rate is 5%.
  3. fonts.googleapis.com is resolved to different IPs every now and then (TTL currently is 300s) due to some kind of round-robbin DNS. Assuming that not all IPs are SBL listed this shows again that this is not a matter of the listing but how the plugin operates.
robert-scheck commented 6 months ago

https://github.com/spamhaus/spamassassin-dqs/blob/666784cf94d8d17e3e2ffc09e95116aad5246af4/3.4.1%2B/SH.pm#L707

does not work, @ricalfieri. SH_BODYURI_REVERSE_SBL uses check_sh_bodyuri_a() which aims to use a skip list via

https://github.com/spamhaus/spamassassin-dqs/blob/666784cf94d8d17e3e2ffc09e95116aad5246af4/3.4.1%2B/SH.pm#L697

which suggests to be filled using uridnsbl_skip_domains, which however unfortunately has absolutely no effect (it's actually uridnsbl_skip_domain googleapis.com fonts.googleapis.com in my SpamAssassin configuration). My expectation is quite simply: Please really skip the domain as the SH.pm code makes admins believe.

xrat commented 6 months ago

Thanks robert-scheck, I was about to open a new issue about $skip_domains next. I think it's a separate bug. Would you like to open a new issue for it?

robert-scheck commented 6 months ago

I think it's a separate bug. Would you like to a new issue for it?

Thank you, done.

xrat commented 6 months ago

Just saw even more FPs for maps.googleapis.com and notifications.googleapis.com.