Closed garethrees closed 8 years ago
Useful docs here:
Note - SpamAssassin support available on Alaveteli version 0.22.2.0 and up)
On FragDenStaat.de (which is not running Alaveteli) we are checking if the apex domain of the incoming "From" email address is equal to the public body's domain, the FOI commissioner's domain or to a domain of any other received email address in the thread.
However, this check is occasionally too strict and has potential for false positives. The messages are not discarded but marked for review as possible spam.
On FYI.org.nz we have a very low level of spam (probably as a side effect of NZ authorities sending most responses - even letters - as scans to PDF). That was until recently, when a particular batch of spam started coming in using addresses which could only have been OCR'd from the PDFs. So far setting all those requests to allow from authority_only and handle with holding_pen has taken care of the problem. But I expect this problem to return as OCR harvesting increases and our agencies start sending more machine-readable text.
My next step will be SPF and DKIM.
A quicker way of deleting email from the Holding Pen would be good to save some admin time.
Another idea would be some email header filters which will send email to a holding pen. It would then be possible to integrate Spam Assassin or similar products, and react to mail of a certain spam likeliness by putting them in the holding pen.
One last comment - the "Handle rejected responses with bounce" setting seems to imply a bounce message produced by the Rails application. I think this is a really bad idea, and should be removed. It is impossible to ensure a bounce is going to the real sender unless you reject it at SMTP-time. Anything else contributes to backscatter. Having been on the receiving end over the years of many spam runs faking my email address, it can make some poor sod's email life hell for days or weeks.
To tackle those creating user profiles just to publish spam links we could prevent new users including any links in their "about me" text for say a month after sign-up.
Anyone trying to add a link before then could just get a message saying they're not an established user.
We don't need to help spammers by publicly defining an established user.
(I wouldn't make it conditional on making requests as we don't want to incentivise spammers to make requests).
In case anyone's thinking why bother - there could be a negative impact on Alaveteli sites as a result of them carrying spammy content - there's a risk to reputations both among humans and search engines.
We updated to the new reCAPTCHA 2 about 5 months ago (803390c4), so this might have improved the situation. We only render reCAPTCHA for signups if the IP is not in the host country; maybe we should render it all the time.
@garethrees The problem with account creation to place spam links on user pages is current on WhatDoTheyKnow
https://www.whatdotheyknow.com/search/http/users/newest?query=http&utf8=%E2%9C%93
The latest account shown via that search was created at 2016-02-29 11:03:12 +0000 the 50th at 2016-02-27 05:02:52 +0000 the 100th at 2016-02-25 05:16:06 +0000
So about 100 such spam accounts have been created over the last 4 days.
Other reasons tackling spam (this time on request threads) is important:
So about 100 such spam accounts have been created over the last 4 days.
We should definitely move to adding recaptcha for all new signups in that case.
Just to add that the amount of spam on the holding pen his an issue for administrators - raised on the WhatDoTheyKnow catchup call.
See also #217 and #2097 on the question of backscatter.
I wonder if there's value in having the spam check happen first here - so if the spam is detected, no bounce would be sent.
Some documentation has been published at http://alaveteli.org/docs/running/handling_spam
I will publicise http://alaveteli.org/docs/running/handling_spam on the Community Update
Dumping https://code.facebook.com/posts/894756093957171/spam-fighting-scale-2016/ here for searchability
RESTRICT_NEW_RESPONSES_ON_OLD_REQUESTS_AFTER_MONTHS
INCOMING_EMAIL_SPAM_*