openstreetmap / openstreetmap-website

The Rails application that powers OpenStreetMap
https://www.openstreetmap.org/
GNU General Public License v2.0
2.16k stars 908 forks source link

New users should have a short limit on their user profile descriptions #5098

Open Firefishy opened 3 weeks ago

Firefishy commented 3 weeks ago

Problem

The user profile descriptions are commonly used by spammers to post spam.

Spammers often post extremely long descriptions within a few minutes of the user account being created.

Problems:

Description

New users ( < 24 hours? ) should only be allowed to post short (512 character?) user description entries.

Screenshots

Screenshot 2024-08-20 at 09 15 23
mmd-osm commented 3 weeks ago

Related: #4694 (avoiding huge texts in general)

tomhughes commented 3 weeks ago

There are plenty of spammers who post short texts as well and any attempt to roll our own spam filtering is pretty much doomed to failure really - the only thing that will achieve anything is likely to be something like #4314 or using a shared service like akismet.

Firefishy commented 3 weeks ago

There are plenty of spammers who post short texts

I don't disagree. But reducing the amount of text they can post helps with the reasons above and will likely help Bayes or other text qualifiers.

AntonKhorev commented 3 weeks ago

How would reducing the amount of text "help Bayes or other text qualifiers"? You've already made a classification here: long text && new account = likely spam.

Firefishy commented 3 weeks ago

How would reducing the amount of text "help Bayes or other text qualifiers"?

Spammers normally want to get particular words into their spam message. eg: "Buy Viagra Here" etc. The rest of the text is often filler, AI generated or just garbage. A bayesian model would likely qualify "Buy", Viagra" & "Here" much higher than than the rest of the text. The additional garbage text might even cause the bayesian model to miss spam text. Hypothetical until tested ;-)

Regardless shorter descriptions helps admins (like me) sort through accounts and identify spam.

Discourse uses how quickly words were "typed" as a likely spam qualifier.

You've already made a classification here: long text && new account = likely spam.

There are MANY other classifications. eg: number.of.dots.in.email.before@gmail.com is a very good qualifier.