Open malenki opened 8 years ago
Well spotted!
At least a captcha would be good on signup page if scripts could be used to create accounts.
@katpatuka Yes, a captcha would be nice, but as far as I know there aren't any non-proprietary, effective captchas available.
I investigated this a few months ago (with respect to the wiki, not this site) and I found https://www.mediawiki.org/wiki/Extension:ConfirmEdit provides a good, up-to-date overview of the options. I don't think we'd go for the only one on that list marked as "high" effectiveness (English-only, potential for adverts), and ReCaptcha always has the problem that by using it, we're helping Google create a proprietary map dataset!
If anyone knows of an effective, non-proprietary captcha then that would be very useful.
Are these accounts actually doing anybody any harm?
We're not a startup that is using user numbers as a measure of success, so if they're just sitting there then who cares?
At the end of the day anything we do will just be a constant arms race where we make life harder and harder for the real users.
If there was a good solution then that would be wonderful, but there isn't, or everybody would be using it!
I am strongly against captcha, because it raises the bar dramatically. We should aim to simplify registration process, not make it harder. These spammy accounts do no harm, I suppose, until they vandalize the map.
It's meant to raise the bar. Prevention is always easier than getting things cured. If someone doesn't know how to register (a 1-time action) for OSM as it stands now, that person is probably not suited to map around.
Noticed this while joining publiclab.org and was pretty fun to fill out. Of course, you need to know English for this.
@planemad I'd be curious to hear from PublicLab if that helped cut down their spam problems
We could use a honeypot or hidden form field that only bots see. https://github.com/markets/invisible_captcha https://github.com/curtis/honeypot-captcha That seems like the simplest and most unobtrusive solution. Could I maybe work on this?
@getschomp the honeypot definitely sounds like the smartest idea so far.
Are these accounts actually doing anybody any harm?
@tomhughes they complicate otherwise easy analysis, so yes; useless accounts are useless for anything
@d1g there is no indication that these accounts are being added automatically, quite the contrary.
A user diary completely spammed with comments. Except the first comment (which was legit), rest of them seem to be by multiple spam accounts.
@simonpoole, I saw how several sites asked users to confirm accounts in the next 3 months or a year. Then they withdrew unused accounts.
We can deactivate unused accounts (without any edits or comments since 2004) once.
Clear public announcement apriori is a must, of course.
We can deactivate unused accounts (without any edits or comments since 2004) once.
What would be the benefit?
@d1g I would be (very very) strongly against doing that. Every day we have users that "reactivate" their pre-licence change account by accepting the CTs, 400 since the beginning of this year alone, 1000's over the past years.
All these accounts were last used -before- May 2010. There is no reason to believe that this pattern is different for more recent accounts either, so by force removing them we would simply be shooting ourselves in the foot.
@HolgerJeromin, to filter out scripted accounts (they semi-defeat benefits of the registration). It may work for very active communities. Creditability of users matters in OSM, probably even more than in Wikimedia projects.
I can buy hundreds of accounts virtually anywhere on the black market. Doesn't mean they would be used for anything good.
But - as Simon pointed - it is painful in OSM: logins can span years.
One area this manifests in is via the diary feature (and corresponding RSS feeds), and we now have the default of review-my-changesets set for newer accounts.
Could we optionally use a (potentially proprietary) captcha on a new diary post; when:
Options such as https://github.com/desirepath41/visualCaptcha do exist now; even if they aren't maintained at the moment.
With very specific criteria, we could avoid accidentally adding barriers for new editors; and only mildly inconvenience people who jump into editing via JOSM or other editors.
Another potential option: rate limit diary posts per account/IP address after the first .. 3-4? 10? with an exponential backoff. Large institutions, VPN users and similar may be slightly affected; but this could be done by excluding posts from the RSS feed (modelling a diary "publish at" timestamp, and only selecting posts to publish between Time.now and 3.days.ago or similar)
That would do absolutely nothing to stop the current spam attacks.
Currently reports in the german forum for more spam-related activity (https://forum.openstreetmap.org/viewtopic.php?pid=822680#p822680). Captcha could lead to problems for visual impaired people. Other than that, the honey-pot is a nice first solution.
https://switching.software/replace/google-recaptcha/ and a longer text about why reCaptcha is probably not needed https://nearcyan.com/you-probably-dont-need-recaptcha/ there are also several suggestions for alternatives listed. A simple question+answer field looks like it could work, but its an additional thing everybody has to answer. But this also happens when introducing a Captcha.
Restricting PM for a specific time may help, but is not good for people who register just to create a note and/or want to ask someone legitimate questions.
Yes there's some activity but it's a tiny number of accounts and quite likely manual so a captcha won't help at all.
I am strongly against captcha, because it raises the bar dramatically. We should aim to simplify registration process, not make it harder. These spammy accounts do no harm, I suppose, until they vandalize the map.
That quote didn't age very well. 😅
Approximately 5k bot accounts were blocked between 19 and 20 August by SomeoneElse alone. Who knows how many more are sitting and waiting for the "wake up packet". The worst part is that the DWG has to dedicate time for blocking them and people have to dedicate even more time on cleaning up. This work is definitely not pointless but it would be better if people don't have to do it. The other problem is that the object versions get inflated very quickly which could become a serious problem in the future.
Cloudflare offers a (seemingly good) free service called Cloudflare Turnstile. It works diffrently compared to ReCaptcha and it doesn't require the user to do anything (except to check a box). It can also be made invisible if you don't want the box to be seen.
Well I know because I've been actively working with DWG to deal with those accounts so thank for the constructive commentary but I will now get back to doing useful work.
Having had a look at user accounts created from November 05 to 09 (that is, during the last 108 hours) there are 11318 accounts. A simple
gives 6576 occurrences of names like this:
There are also some false positives, but a more strict
still has 4367 hits. Looking for names with spam in them:
gives a quite meagre 30 results.
With the above I only want to show that at least during the last four days around 50% of the newly registered accounts seem to be created by bots/scripts and assumedly won't be used for the benefit of OSM. Instead of having to remove these users by hand by admins it would spare work if they had a harder time to register accounts using scripts.
I know that it is easy to say "I'd like to have" but hard to solve the issue. Though I hope you can find a solution.
See also: #841 My dairy entry resulting in this issue. The data sample I used you can find here.