Open benrfairless opened 9 months ago
@benrfairless did you have something specific in mind?
I've gone through and cleaned up another batch of these users. Ideally, it would be good to just purge them from the site.
Reached out to the alaveteli-dev group to see if there's any way to do this.
Missive conversation: https://mail.missiveapp.com/#inbox/conversations/60670194-e2b8-4f5f-ad4d-71d3fb80e1d8
We don't have a script exactly like the above – you'd need to write one. Should be pretty easy, along the lines of:
User.banned.find_each do |user| user.destroy! if user.info_requests.none? && user.comments.none? && user.track_things.none? end
n.b. I haven't checked this at all – its just an off the cuff gist of the kind of code you'd need to write. You can then run through rails console or rails runner.
We've faced similar in the past (and still get spam signups, though these days the impact is much lower since user profiles/pictures aren't as indexable by search engines) so we wrote https://github.com/mysociety/alaveteli/blob/develop/lib/tasks/cleanup.rake#L16-L54 which might be of interest.
Also:
Only tangentially answering your question, this is more about preventing spam account creation than removing them.
We had issues with spam accounts for a while as well, until I realised they all used a url in their account names, so I added a spam scoring rule to block such account names at creation, and the problem stopped. I don't know if it is the case for you, but just in case:
patch https://github.com/mysociety/alaveteli/blob/develop/config/initializers/user_spam_scorer.rb#L7 to replace its content with
settings = YAML.load(File.read(path), permitted_classes: [Regexp])['user_spam_scorer']
This allows defining regexps in the following config file, otherwise ruby will crash.
Then update config/user_spam_scorer.yml with something like what we have here: https://gitlab.com/madada-team/dada-core/-/blob/master/ansible/roles/alaveteli/templates/config_user_spam_scorer.yml#L63
In our case, we just added the last line to the default values alaveteli defines to prevent urls like https:// which does not seem like a useful account name anyway.
OK so I've had a look into this. It appears that we are getting a lot of people from countries I wouldn't expect.
We use Cloudflare (not sure which plan) so it's possible we might be able to create a rule that puts some additional scrutiny on connections from those countries.
@mlandauer is it possible to get access to the Cloudflare configuration for the RTK domain so I can look into this further?
The way I combat e.g. genAI scraping is to make it expensive. Same with e-mail servers I make it expensive computation for the spammers to do anything.
People also use VPNs and geo-blocking is kinda sub-optimal typical. Anything related to mapping to IP addresses etc. generally doesn't work.
One option could be to have trust levels for users, new users get very low trust and trusted users can "vouch" or give thumbs up other users to be more trusted.
This way one could also build some community around trust and raise the benchmark for how people should be using the platform when the good behaviours get recognition and visibility.
Currently people don't see high expectations are there is quite bit of misguided requests and people don't see either the need or the reward to being successfull with their requests - but this is another issue I want to raise later and I have some ideas how to develop that.
FWIW - One could require new users to receive one Vouch from existing trusted user to start sending requests over them being draft only -
New user could create easily a draft request or requests and then anyone with trust / established user could quickly review them from some accessible pipeline / notifications - whoever gets it first can review it.
e.g. this could also help adding peer reviewing and to improve the request quality and being more community oriented.
Here's an easy search: https://www.righttoknow.org.au/admin/users?page=1&query=http&sort_order=created_at_desc
3 pages, so I'm assuming over 300. I'm sure some are legit, but most of them are for adult content and the others are commercial businesses doing link farming.
Surely there has to be a way to get around this.