openrightsgroup / blocked-org-uk

Template front-end code, markup, style-sheets, images and other assets for the Censorship Monitoring Project (blocked.org.uk)
https://www.blocked.org.uk/
GNU General Public License v3.0
13 stars 5 forks source link

Mitigate spam submissions through web front-end #47

Closed graphiclunarkid closed 10 years ago

graphiclunarkid commented 10 years ago

We are already starting to see spam introduced into our development server (example below) so I think we're going to have to consider adding a captcha (or another mechanism) to mitigate this.

-------- Original Message -------- Subject: Blocked site http://uk.news.yahoo.com/decaying-trees-may-key-mysterious-dune-holes-152010548.html#ytlmZnL Date: Sat, 17 May 2014 16:45:27 +0100 From: To:

Email:

Domain to check: http://uk.news.yahoo.com/decaying-trees-may-key-mysterious-dune-holes-152010548.html#ytlmZnL

Type of site: Select a category

Additional info:

Join mailing list: [[+joinlist]]

Happy to be contacted: [[+allowcontact]]

EvelynSubarrow commented 10 years ago

Captcha sounds the most realistic, although it might be irritating to real humans.... Perhaps we could have some kind of alternative in parallel for those we trust enough to be human?

mkillock commented 10 years ago

FormIt has a recaptcha hook, which looks easy enough to add:

http://rtfm.modx.com/extras/revo/formit/formit.hooks/formit.hooks.recaptch

I didn't do this because I think we need to sign up for a recaptcha account for the blocked.org.uk domain - perhaps we should be using an ORG email address/contact to do this? Or maybe ORG has such an account already that we can use?

https://www.google.com/recaptcha/admin#whyrecaptcha

mkillock commented 10 years ago

There are alternative captchas that involve doing maths, if that is preferred?

graphiclunarkid commented 10 years ago

Via email, @JimKillock has said we have a recaptcha account already, and it's implemented with formit on the main modx back-end. Should be easy enough to incorporate into blocked.org.uk therefore.

webal commented 10 years ago

I've had a lot of success from just renaming the 'email' and 'url' fields to something random like a hash, and then adding hidden fields called 'email' or 'url' to the form. Auto spam bots will always fill out these fields where as it's not possible for a genuine user to do that so if you check the submitted form for anything in the 'email' field and if the is something ignore it

graphiclunarkid commented 10 years ago

@webal, that's genius!

mkillock commented 10 years ago

graphiclunarkid - so we'll do that after the move to live?

mkillock commented 10 years ago

or indeed use webal's suggestion as an alternative?

graphiclunarkid commented 10 years ago

@webal's idea is (probably) quite a simple change, and has the benefit of being invisible to visitors, so why don't we try that to start with? If it's not sufficient we can always add recaptcha later.

mkillock commented 10 years ago

I noticed that there is already a hidden 'url' field in the form:

input type="hidden" name="url" value=""

and the real 'url' field is:

type="url" name="domainToCheck"

So, for @webal 's idea, what changes would we need to make?

webal commented 10 years ago

think the existing one may be a hangover from the HTML mockups.

If you change the name & id of the email & url inputs to a random string, and then change the backend to match this the form should continue to work as it does.

Once that's done I'd stick in a couple of hidden fields with the name 'email' & 'url', then in ModX when it receives a form it should first check to see if the email or url fields have a value, if they do then you can just disregard the submission

mkillock commented 10 years ago

OK, I'll have a go at this later. I will need to keep recording the submission to the formsave table otherwise the form will refuse to submit (maybe that's preferred?) but I can mark it as 'suspect spam' or similar. And I can add that check in the url submission so we don't add it to the queue. How does that sound?

webal commented 10 years ago

should be fine, if it clogs up the formsave table we can always look at that in the future...

ei8fdb commented 10 years ago

I think I follow…

How would that affect a screen reader user? The screen reader software will read the HTML of a page. Will the hash be present there?

On 22 May 2014, at 15:33, webal notifications@github.com wrote:

I've had a lot of success from just renaming the 'email' and 'url' fields to something random like a hash, and then adding hidden fields called 'email' or 'url' to the form. Auto spam bots will always fill out these fields where as it's not possible for a genuine user to do that so if you check the submitted form for anything in the 'email' field and if the is something ignore it

ei8fdb commented 10 years ago

If you want to try this change I can test it on my screen reader to see how usable, or not, it is.

On 22 May 2014, at 15:40, Richard King notifications@github.com wrote:

@webal's idea is (probably) quite a simple change, and has the benefit of being invisible to visitors, so why don't we try that to start with? If it's not sufficient we can always add recaptcha later.

webal commented 10 years ago

I don't think accessibility is affected the hash is only used for the name and id attribute of the input tag. So long as < label > is used correctly with the name of the hash there shouldn't be any issues - that I can think of anyway!

mkillock commented 10 years ago

I've written something for this now, changes made to test submission page. Seems to work, could @webal check that I have done the form as proposed? Specifically, I left in the type=url values and suchlike. Presumably that can't be changed.

This in effect adds two new entries to the formSave table, because I copied the hash-value-variables into fields with more meaningful names. There doesn't seem to be a way of removing unwanted values from a FormIt hook

Changes made: (for migration to new snippet purposes)

mkillock commented 10 years ago

Another thought: As this changes the id parameter, does that affect any CSS styling?

webal commented 10 years ago

I've copied the form over the the home page and it's all working great:) the id isn't an issue as the styling is all done with classes. The type='url' should remain, I don't think that it will affect the detection of spam as spammers will still end up filling out the fake email field anyway.

mkillock commented 10 years ago

ok! Thanks.

I've renamed the snippets - oldSubmitURL which doesn't use the hashes, and SubmitURL is now the one we're using. I've updated the FormIt call

mkillock commented 10 years ago

@webal - sorry, were we editing the home page at the same time? The stats now look wrong. I had noticed a problem with the form when we submit something incorrect - this is working now, but I may have overwritten some changes you made?

webal commented 10 years ago

We may have been, I'm tweaking the stats atm - just trying to get the JS working, I'll hold off making any more changes till you're finished - just let me know when you're done

mkillock commented 10 years ago

ok, I'm done on the form changes. I could put those stats calcs in the GetURLStats? (Which appears to be missing on the home page atm)

webal commented 10 years ago

Cheers, that's fine I'll copy them through when I copy the stats over from the repo

webal commented 10 years ago

Did something change with the way the stats were pulled out? I think I was using [[!+blocked_sites_detected]] before to pull out the stats but it doesn't seem to be returning anything now. Any chance you could double check I've not messed up on the home page please - I won't touch it till I hear back.

mkillock commented 10 years ago

[[!GetURLStats]] was missing - this snippet puts the values into those placeholders.

Nice Percentage thing! :)

webal commented 10 years ago

Brilliant - thanks

mkillock commented 10 years ago

I think we can close this - the SubmitURL hook only submits a URL if the fake 'url' and 'email' fields are blank. The formSave table is amended with APIResult='SPAM' (it's not really a response from the API) so these can be identified later. But basically, we're ignoring spammy form entries