openaustralia / righttoknow

Theme for, and issues specific to, Right To Know.
https://www.righttoknow.org.au/
MIT License
21 stars 14 forks source link

We have a lot of Spam Users #826

Open benrfairless opened 5 months ago

benrfairless commented 5 months ago

Here's an easy search: https://www.righttoknow.org.au/admin/users?page=1&query=http&sort_order=created_at_desc

3 pages, so I'm assuming over 300. I'm sure some are legit, but most of them are for adult content and the others are commercial businesses doing link farming.

Surely there has to be a way to get around this.

katska commented 3 months ago

@benrfairless did you have something specific in mind?

benrfairless commented 1 month ago

I've gone through and cleaned up another batch of these users. Ideally, it would be good to just purge them from the site.

Reached out to the alaveteli-dev group to see if there's any way to do this.

benrfairless commented 3 weeks ago

Missive conversation: https://mail.missiveapp.com/#inbox/conversations/60670194-e2b8-4f5f-ad4d-71d3fb80e1d8

We don't have a script exactly like the above – you'd need to write one. Should be pretty easy, along the lines of:

User.banned.find_each do |user| user.destroy! if user.info_requests.none? && user.comments.none? && user.track_things.none? end

n.b. I haven't checked this at all – its just an off the cuff gist of the kind of code you'd need to write. You can then run through rails console or rails runner.

We've faced similar in the past (and still get spam signups, though these days the impact is much lower since user profiles/pictures aren't as indexable by search engines) so we wrote https://github.com/mysociety/alaveteli/blob/develop/lib/tasks/cleanup.rake#L16-L54 which might be of interest.

Also:

Only tangentially answering your question, this is more about preventing spam account creation than removing them.

We had issues with spam accounts for a while as well, until I realised they all used a url in their account names, so I added a spam scoring rule to block such account names at creation, and the problem stopped. I don't know if it is the case for you, but just in case:

patch https://github.com/mysociety/alaveteli/blob/develop/config/initializers/user_spam_scorer.rb#L7 to replace its content with

settings = YAML.load(File.read(path), permitted_classes: [Regexp])['user_spam_scorer']

This allows defining regexps in the following config file, otherwise ruby will crash.

Then update config/user_spam_scorer.yml with something like what we have here: https://gitlab.com/madada-team/dada-core/-/blob/master/ansible/roles/alaveteli/templates/config_user_spam_scorer.yml#L63

In our case, we just added the last line to the default values alaveteli defines to prevent urls like https:// which does not seem like a useful account name anyway.

benrfairless commented 1 week ago

OK so I've had a look into this. It appears that we are getting a lot of people from countries I wouldn't expect.

We use Cloudflare (not sure which plan) so it's possible we might be able to create a rule that puts some additional scrutiny on connections from those countries.

@mlandauer is it possible to get access to the Cloudflare configuration for the RTK domain so I can look into this further?