openstreetmap / openstreetmap-website

The Rails application that powers OpenStreetMap
https://www.openstreetmap.org/
GNU General Public License v2.0
2.09k stars 907 forks source link

suppress creating accounts by bots/scripts #1083

Open malenki opened 8 years ago

malenki commented 8 years ago

Having had a look at user accounts created from November 05 to 09 (that is, during the last 108 hours) there are 11318 accounts. A simple

grep _name *| cut -d '"' -f 4| grep -E -i "[a-z][0-9]{1,3}" -c 

gives 6576 occurrences of names like this:

William925i6f
William925i6fs0
William936f5an8
William937j8ka4
William937l8mc4
William937l8md4
William937m8n9
William939v1zr0
William947m8n
William947r3yu4
William948v2wd6
William959x5if7
William962h2ro1
William975o4he9
William9b2h1dn8
William9f7j7ja4
William9i4d2kx5
William9l9f5px2
William9o3k6uf1
William9q0q0qf6
William9q9l8hu1
William9q9o9nc4
William9r0o8jx2
William9r1t1tk6
William9s1t9pg6
William9s1u0tj7
William9s2s0cs2

There are also some false positives, but a more strict

grep _name *| cut -d '"' -f 4| grep -E -i "[a-z][0-9]{1,3}[a-z][0-9]{1,3}" -c

still has 4367 hits. Looking for names with spam in them:

grep _name *| egrep -i "premium|cash|bank|credit|mobil|phone|handy|pharma|viagra|free|generic" -c

gives a quite meagre 30 results.

With the above I only want to show that at least during the last four days around 50% of the newly registered accounts seem to be created by bots/scripts and assumedly won't be used for the benefit of OSM. Instead of having to remove these users by hand by admins it would spare work if they had a harder time to register accounts using scripts.

I know that it is easy to say "I'd like to have" but hard to solve the issue. Though I hope you can find a solution.

See also: #841 My dairy entry resulting in this issue. The data sample I used you can find here.

katpatuka commented 8 years ago

Well spotted!

At least a captcha would be good on signup page if scripts could be used to create accounts.

gravitystorm commented 8 years ago

@katpatuka Yes, a captcha would be nice, but as far as I know there aren't any non-proprietary, effective captchas available.

I investigated this a few months ago (with respect to the wiki, not this site) and I found https://www.mediawiki.org/wiki/Extension:ConfirmEdit provides a good, up-to-date overview of the options. I don't think we'd go for the only one on that list marked as "high" effectiveness (English-only, potential for adverts), and ReCaptcha always has the problem that by using it, we're helping Google create a proprietary map dataset!

If anyone knows of an effective, non-proprietary captcha then that would be very useful.

tomhughes commented 8 years ago

Are these accounts actually doing anybody any harm?

We're not a startup that is using user numbers as a measure of success, so if they're just sitting there then who cares?

At the end of the day anything we do will just be a constant arms race where we make life harder and harder for the real users.

If there was a good solution then that would be wonderful, but there isn't, or everybody would be using it!

Zverik commented 8 years ago

I am strongly against captcha, because it raises the bar dramatically. We should aim to simplify registration process, not make it harder. These spammy accounts do no harm, I suppose, until they vandalize the map.

gplv2 commented 8 years ago

It's meant to raise the bar. Prevention is always easier than getting things cured. If someone doesn't know how to register (a 1-time action) for OSM as it stands now, that person is probably not suited to map around.

planemad commented 8 years ago

Noticed this while joining publiclab.org and was pretty fun to fill out. Of course, you need to know English for this.

screenshot 2015-11-26 12 42 52
mikelmaron commented 8 years ago

@planemad I'd be curious to hear from PublicLab if that helped cut down their spam problems

getschomp commented 8 years ago

We could use a honeypot or hidden form field that only bots see. https://github.com/markets/invisible_captcha https://github.com/curtis/honeypot-captcha That seems like the simplest and most unobtrusive solution. Could I maybe work on this?

planemad commented 8 years ago

@getschomp the honeypot definitely sounds like the smartest idea so far.

d1g commented 7 years ago

Are these accounts actually doing anybody any harm?

@tomhughes they complicate otherwise easy analysis, so yes; useless accounts are useless for anything

http://www.openstreetmap.org/user/SimonPoole/diary/40246

simonpoole commented 7 years ago

@d1g there is no indication that these accounts are being added automatically, quite the contrary.

srravya commented 7 years ago

A user diary completely spammed with comments. Except the first comment (which was legit), rest of them seem to be by multiple spam accounts.

d1g commented 7 years ago

@simonpoole, I saw how several sites asked users to confirm accounts in the next 3 months or a year. Then they withdrew unused accounts.

We can deactivate unused accounts (without any edits or comments since 2004) once.

Clear public announcement apriori is a must, of course.

HolgerJeromin commented 7 years ago

We can deactivate unused accounts (without any edits or comments since 2004) once.

What would be the benefit?

simonpoole commented 7 years ago

@d1g I would be (very very) strongly against doing that. Every day we have users that "reactivate" their pre-licence change account by accepting the CTs, 400 since the beginning of this year alone, 1000's over the past years.

All these accounts were last used -before- May 2010. There is no reason to believe that this pattern is different for more recent accounts either, so by force removing them we would simply be shooting ourselves in the foot.

d1g commented 7 years ago

@HolgerJeromin, to filter out scripted accounts (they semi-defeat benefits of the registration). It may work for very active communities. Creditability of users matters in OSM, probably even more than in Wikimedia projects.

I can buy hundreds of accounts virtually anywhere on the black market. Doesn't mean they would be used for anything good.

But - as Simon pointed - it is painful in OSM: logins can span years.

CloCkWeRX commented 5 years ago

One area this manifests in is via the diary feature (and corresponding RSS feeds), and we now have the default of review-my-changesets set for newer accounts.

Could we optionally use a (potentially proprietary) captcha on a new diary post; when:

Options such as https://github.com/desirepath41/visualCaptcha do exist now; even if they aren't maintained at the moment.

With very specific criteria, we could avoid accidentally adding barriers for new editors; and only mildly inconvenience people who jump into editing via JOSM or other editors.

CloCkWeRX commented 5 years ago

Another potential option: rate limit diary posts per account/IP address after the first .. 3-4? 10? with an exponential backoff. Large institutions, VPN users and similar may be slightly affected; but this could be done by excluding posts from the RSS feed (modelling a diary "publish at" timestamp, and only selecting posts to publish between Time.now and 3.days.ago or similar)

tomhughes commented 5 years ago

That would do absolutely nothing to stop the current spam attacks.

natrius commented 3 years ago

Currently reports in the german forum for more spam-related activity (https://forum.openstreetmap.org/viewtopic.php?pid=822680#p822680). Captcha could lead to problems for visual impaired people. Other than that, the honey-pot is a nice first solution.

https://switching.software/replace/google-recaptcha/ and a longer text about why reCaptcha is probably not needed https://nearcyan.com/you-probably-dont-need-recaptcha/ there are also several suggestions for alternatives listed. A simple question+answer field looks like it could work, but its an additional thing everybody has to answer. But this also happens when introducing a Captcha.

Restricting PM for a specific time may help, but is not good for people who register just to create a note and/or want to ask someone legitimate questions.

tomhughes commented 3 years ago

Yes there's some activity but it's a tiny number of accounts and quite likely manual so a captcha won't help at all.

Dimitar5555 commented 10 months ago

I am strongly against captcha, because it raises the bar dramatically. We should aim to simplify registration process, not make it harder. These spammy accounts do no harm, I suppose, until they vandalize the map.

That quote didn't age very well. 😅

Approximately 5k bot accounts were blocked between 19 and 20 August by SomeoneElse alone. Who knows how many more are sitting and waiting for the "wake up packet". The worst part is that the DWG has to dedicate time for blocking them and people have to dedicate even more time on cleaning up. This work is definitely not pointless but it would be better if people don't have to do it. The other problem is that the object versions get inflated very quickly which could become a serious problem in the future.

Cloudflare offers a (seemingly good) free service called Cloudflare Turnstile. It works diffrently compared to ReCaptcha and it doesn't require the user to do anything (except to check a box). It can also be made invisible if you don't want the box to be seen.

tomhughes commented 10 months ago

Well I know because I've been actively working with DWG to deal with those accounts so thank for the constructive commentary but I will now get back to doing useful work.