Spam Prevention Measures

jrgifford commented 11 years ago

So, related to #319, but not really.

Right now, I'd say about 7/8ths of posts on rstat.us are spammers. Can we somehow fix this?

colindean commented 11 years ago

Perhaps we can run some kind of report to gather some statistics.

E.g. real users are going to be following at least one person and have at least one post that doesn't have a link in it.

Questions I have:

How many users does rstat.us have?
How many are following at least one person?
How many are followed by at least one person?
How many have a high daily post average? Perhaps more than 10 posts per day since account creation.
How many have at least one post that does not contain a link?
How many have at least one post that is a reply to another user?

wilkie commented 11 years ago

I'm not concerned. It really doesn't matter. We can just highlight some active users and whitelist them for the updates on the front page, but just allow the spammers. They aren't hurting anything, and honestly nobody is going to audit them to avoid the false positives.

rstat.us is meant for running your own small node, so the only feature necessary is the ability to block (blacklist) accounts. You can just disable user creation on your node. The main node isn't an actual priority to me. In reality, it is just a show-and-tell of what it can do. The main node is for trying it out, seeing how it works, but the idea is that it works far better when you run it yourself.

steveklabnik commented 11 years ago

We can just highlight some active users and whitelist them for the updates on the front page, but just allow the spammers.

That might be a good idea, yeah.

mathias commented 11 years ago

I'll be on the other side of the fence and point out that we've had great success with filtering spam messages with the despamilator gem. We set the spam score threshold rather high, too.

https://github.com/moowahaha/despamilator

wilkie commented 11 years ago

With the type of formatting that people generally employ in status updates (small amount of text, lack of proper grammar, dominated by links) I fear the false-positives would be higher than in normal situations. And in a medium that is supposed to be for public communication of such short messages, false positives would be very bad. Since this service is primarily geared toward the ownership of nodes and not the running of a large-scale node, the spam problem only exists based upon receiving messages from unknown sources, where manual blocking seems more appropriate. The other side of the fence would consist of visually tagging the 'maybe-spam' messages somehow as they arrive... but that doesn't really help when you still have to audit them anyway.

colindean commented 11 years ago

Another aspect to consider is what effect the spam has on the site's resources, such as dyno time and storage necessary for the spam it's getting. I'm unfamiliar with the costs of operating the node. If spam is increasing the costs of operating significantly, then it may be better to take some kind of action sooner.

Moreover, do we want to consider at all how this spam affects rstat.us' reputation with search engines? I am by no means an expert on the dark arts of SEO, but I've always held a belief that crap content means crap reputation, meaning that useful posts on rstat.us are less likely to be search results for various engines because the signal-to-noise ratio is significantly less than 1.

wilkie commented 11 years ago

We get around 12 requests per minute, which is so light I couldn't possibly care. The data usage is also exceptionally minimal as each status update takes up around 200 bytes. We have no spam policy and so we have no action we could possibly take. We would have to come up with a spam policy and manually enforce it. That's not very realistic so it's just not going to happen.

The only thing we could reasonably do is reject search engines from collecting the updates altogether and work on evangelizing the platform to be used externally. And just forget about this type of maintenance on the main node altogether.

colindean commented 11 years ago

What about a periodic purge, since rstat.us is apparently meant to be just a demo? We give some kind of week-ahead notice and purge accounts and posts that haven't done some task to prove they're not spam? E.g. fill out this captcha box or something.

mathias commented 10 years ago

As a maintainer of a small node, I'd like some measure to actually be able to limit signups and prevent spam. Since there's nothing in place to prevent spammers, its only a matter of time before they find my node and sign up.

I think you'd feel differently if the spam was costing you money every month for MongoDB storage and slowing down the service. I consider manual deletion to be overkill and too much labor.

What about some sort of spam reporting engine with a threshold? At least, if users are reported for spam past a threshold, maybe it can send an email to an admin email address or raise an exception that goes to Airbrake / other exception service and therefore alerts maintainers to the problem user that should be canned.

wilkie commented 10 years ago

I'd say a valid spam prevention technique for a node would be to 1. disable signup completely and 2. allow blocking of any external account

chadwhitacre commented 9 years ago

I'm looking at the Updates page and it's pretty much all spam. Is Updates supposed to be a page I look at? I expected that to be a feed of the people I'm following.

screen shot 2014-08-15 at 1 31 42 am

carols10cents commented 9 years ago

I'd say a valid spam prevention technique for a node would be to 1. disable signup completely and 2. allow blocking of any external account

has been completed, you can now completely disable signups. Leaving open for 2.

hotsh / rstat.us

Spam Prevention Measures #748