Open Firefishy opened 3 weeks ago
Related: #4694 (avoiding huge texts in general)
There are plenty of spammers who post short texts as well and any attempt to roll our own spam filtering is pretty much doomed to failure really - the only thing that will achieve anything is likely to be something like #4314 or using a shared service like akismet.
There are plenty of spammers who post short texts
I don't disagree. But reducing the amount of text they can post helps with the reasons above and will likely help Bayes or other text qualifiers.
How would reducing the amount of text "help Bayes or other text qualifiers"? You've already made a classification here: long text && new account = likely spam.
How would reducing the amount of text "help Bayes or other text qualifiers"?
Spammers normally want to get particular words into their spam message. eg: "Buy Viagra Here" etc. The rest of the text is often filler, AI generated or just garbage. A bayesian model would likely qualify "Buy", Viagra" & "Here" much higher than than the rest of the text. The additional garbage text might even cause the bayesian model to miss spam text. Hypothetical until tested ;-)
Regardless shorter descriptions helps admins (like me) sort through accounts and identify spam.
Discourse uses how quickly words were "typed" as a likely spam qualifier.
You've already made a classification here: long text && new account = likely spam.
There are MANY other classifications. eg: number.of.dots.in.email.before@gmail.com is a very good qualifier.
Problem
The user profile descriptions are commonly used by spammers to post spam.
Spammers often post extremely long descriptions within a few minutes of the user account being created.
Problems:
Description
New users ( < 24 hours? ) should only be allowed to post short (512 character?) user description entries.
Screenshots