mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction
https://alaveteli.org
Other
389 stars 195 forks source link

Review, document and improve spam prevention in Alaveteli #2959

Closed garethrees closed 8 years ago

garethrees commented 8 years ago
lizconlan commented 8 years ago

Useful docs here:

Note - SpamAssassin support available on Alaveteli version 0.22.2.0 and up)

stefanw commented 8 years ago

On FragDenStaat.de (which is not running Alaveteli) we are checking if the apex domain of the incoming "From" email address is equal to the public body's domain, the FOI commissioner's domain or to a domain of any other received email address in the thread.

However, this check is occasionally too strict and has potential for false positives. The messages are not discarded but marked for review as possible spam.

olineham commented 8 years ago

On FYI.org.nz we have a very low level of spam (probably as a side effect of NZ authorities sending most responses - even letters - as scans to PDF). That was until recently, when a particular batch of spam started coming in using addresses which could only have been OCR'd from the PDFs. So far setting all those requests to allow from authority_only and handle with holding_pen has taken care of the problem. But I expect this problem to return as OCR harvesting increases and our agencies start sending more machine-readable text.

My next step will be SPF and DKIM.

A quicker way of deleting email from the Holding Pen would be good to save some admin time.

Another idea would be some email header filters which will send email to a holding pen. It would then be possible to integrate Spam Assassin or similar products, and react to mail of a certain spam likeliness by putting them in the holding pen.

One last comment - the "Handle rejected responses with bounce" setting seems to imply a bounce message produced by the Rails application. I think this is a really bad idea, and should be removed. It is impossible to ensure a bounce is going to the real sender unless you reject it at SMTP-time. Anything else contributes to backscatter. Having been on the receiving end over the years of many spam runs faking my email address, it can make some poor sod's email life hell for days or weeks.

RichardTaylor commented 8 years ago

To tackle those creating user profiles just to publish spam links we could prevent new users including any links in their "about me" text for say a month after sign-up.

Anyone trying to add a link before then could just get a message saying they're not an established user.

We don't need to help spammers by publicly defining an established user.

(I wouldn't make it conditional on making requests as we don't want to incentivise spammers to make requests).

In case anyone's thinking why bother - there could be a negative impact on Alaveteli sites as a result of them carrying spammy content - there's a risk to reputations both among humans and search engines.

garethrees commented 8 years ago

We updated to the new reCAPTCHA 2 about 5 months ago (803390c4), so this might have improved the situation. We only render reCAPTCHA for signups if the IP is not in the host country; maybe we should render it all the time.

RichardTaylor commented 8 years ago

@garethrees The problem with account creation to place spam links on user pages is current on WhatDoTheyKnow

https://www.whatdotheyknow.com/search/http/users/newest?query=http&utf8=%E2%9C%93

The latest account shown via that search was created at 2016-02-29 11:03:12 +0000 the 50th at 2016-02-27 05:02:52 +0000 the 100th at 2016-02-25 05:16:06 +0000

So about 100 such spam accounts have been created over the last 4 days.

RichardTaylor commented 8 years ago

Other reasons tackling spam (this time on request threads) is important:

garethrees commented 8 years ago

So about 100 such spam accounts have been created over the last 4 days.

We should definitely move to adding recaptcha for all new signups in that case.

RichardTaylor commented 8 years ago

Just to add that the amount of spam on the holding pen his an issue for administrators - raised on the WhatDoTheyKnow catchup call.

crowbot commented 8 years ago

See also #217 and #2097 on the question of backscatter.

crowbot commented 8 years ago

I wonder if there's value in having the spam check happen first here - so if the spam is detected, no bounce would be sent.

garethrees commented 8 years ago

Some documentation has been published at http://alaveteli.org/docs/running/handling_spam

Gemmamysoc commented 8 years ago

I will publicise http://alaveteli.org/docs/running/handling_spam on the Community Update

garethrees commented 7 years ago

Dumping https://code.facebook.com/posts/894756093957171/spam-fighting-scale-2016/ here for searchability