spam filtering - Githubissues

posativ commented 11 years ago

It's surprisingly difficult to find a usable general spam filter software. That's currently available:

DSPAM – requires a daemon, dafuq?
CRM114 – bloated, but there's a stripped down, abandoned ANSI C implementation with Python support, but doesn't compile on my system.
bayes classifier – most are > 4 years old, don't have a test suite etc.

Probably DIY: http://crm114.sourceforge.net/docs/classify_details.txt

posativ commented 11 years ago

spam – BAYESIAN SPAM DETECTOR, but warn about spam -> ham poisoning

noqqe commented 11 years ago

You could give bogofilter a try as well.

srijan commented 10 years ago

What about using an external service for this (like akismet)?

posativ commented 10 years ago

External services can be implemented if they are not required for Isso (Akismet is a US service). Nevertheless, spam filtering should by default not rely on any third-party provider (as it defeats the purpose of self-hosting).

srijan commented 10 years ago

Yeah. I meant as an option, not as default (of course).

Lux-Delux commented 10 years ago

Perhaps something along the lines of this wordpress plugin http://web-profile.com.ua/wordpress/plugins/anti-spam-pro/

Simply puts a question that you have to answer in case javascript is disabled... otherwise you don't even notice it

posativ commented 10 years ago

This plugin is quite useless as it fails when bots begin to interpret JS (which is not that hard, but requires some compution power I guess). But similar to the plugin, Isso is currently not affected by spam (my demo site does not receive spam e.g.) because most bots are not capable of evaluating JavaScript and if they do, they hopefully abort the computation because PBKDF2 takes too long.

However, a targeted attack which uses the pubic API might be an issue someday.

silverhook commented 10 years ago

I’m considering Isso (or Discourse) as an option to enable comments again on my Pelican blog.

I agree with @posativ that because of the matter of trust, this should be self-hosted. What I’m still worried about is how much a spam filter would hammer my poor little ARM server.

The reason why I disabled comments (and moved from the otherwise very nice Habari) is that spammers would effectively DoS my server, since the spam filter (Bayesean + honey pot) would just consume way too much CPU time.

If spam filtering can be done in a not too expensive local way or in a distributed way that can be trusted, I would be very happy to have comments (very likely with Isso) enabled again.

posativ commented 10 years ago

I am not aware of any (real) spam, neither in my personal blog nor in the demo. @noqqe reported that he didn't receive a single spam comment in over a year, too.

That's probably because of Isso is still quite unknown and is written in JavaScript instead of a pre-rendered HTML snippet. Or the Js interpreter of typical spam robots is broken.

castarco commented 9 years ago

Hi, I'm interested on contributing spam filtering. Captcha systems aren't enough since there are paid people doing manual spam, not only bots (personally, in my old blog I received a lot of spam, sometimes coming from real humans).

About the spam filtering system, we can use "support vector machines" instead of bayesian filters, which are relatively efficient after the training phase, so DOS attacks are improbable to be successful.

fluffy-critter commented 4 years ago

Lately my site (running isso) has started to receive quite the onslaught of spam. I don't know how much is human-posted and how much is from javascript-aware automation stuff but either way, some of them are even gloating about "easiest captcha ever" on their comment, as if their comment is going to be seen by the public or would be indexed by a search engine. (Also why the hell haven't bots figured out that everyone has used rel="nofollow" for like 20 years?)

Anyway. Yeah. A plugin system would be great. I"m willing to spend some time working on one.

onaralili commented 3 years ago

External services can be implemented if they are not required for Isso (Akismet is a US service). Nevertheless, spam filtering should by default not rely on any third-party provider (as it defeats the purpose of self-hosting).

Would be great to have a plugin system to integrate with third-parties. As an alternative to Akismet, there is OOPSpam which is GDPR complaint.

ix5 commented 2 years ago

Anyway. Yeah. A plugin system would be great. I"m willing to spend some time working on one.

@fluffy-critter did you eventually come up with something?

fluffy-critter commented 2 years ago

I haven't had time/energy to work on anything, unfortunately.

taoeffect commented 2 years ago

Hey guys: consider PoW as a simpler means of spam filtering:

ix5 commented 2 years ago

I haven't had time/energy to work on anything, unfortunately.

No worries, I was just curious. This is not too important anyway.

In general, a plugin API would be neat to have. Extending the signals system to trigger spam detection upon a new comment should be my idea.

ix5 commented 2 years ago

Hey guys: consider PoW as a simpler means of spam filtering:
* https://git.sequentialread.com/forest/pow-captcha

* https://mcaptcha.org/

Those two look interesting, but heads up, they require modern browsers and wasm support.

fluffy-critter commented 2 years ago

Also I'm not sure what problem that actually solves, beyond making sure someone's idle on a page for a certain amount of time before they submit. Most of the spam I get appears to be submitted by humans who are paid money to defeat CAPTCHAs, as has been the case with most comment spam for at least the past decade.

taoeffect commented 2 years ago

It would cut down on some spam, probably not all. You could increase the difficulty most likely in the settings, etc. It's simple to setup. There are tradeoffs with everything. If you really wanted to prevent all spam (and also some legitimate comments), you could charge micropayments over the lightning network. :P

ForestJohnson commented 11 months ago

Hi, I just heard about this project and it looks nice! In the past I made my own project that was similar and I created the git.sequentialread.com/forest/pow-captcha for it. If you would like to try it out, it's hosted here:

https://sequentialread.com/now-with-comments/#sqr-comment-container

I have also seen spam from humans. There were some SEO spammers trying to register accounts on our gitea server and post links to their clients businesses. We were not able to stop them until we implemented a required invite token for registration :(

To be honest I'm not sure what to do about that kind of spam besides putting the comments into a moderation queue and having someone look at them.

My "pow-captcha" is not actually a captcha at all, I think it's just a bot deterrent. Unfortunately its also a deterrent for people who run customized browsers with anti-fingerprinting or new features disabled. I use it as bot deterrent for other things and I have seen my friends blocked by it because the privacy browser they use on their phone would not allow WebWorkers etc :(

I think unfortunately spam is always going to be impossible to stop automatically with high accuracy. Maybe for the next version of my site I will try out isso for comments but make it skip the moderation queue if the browser was able to solve the pow challenge.

making sure someone's idle on a page for a certain amount of time before they submit.

The animated gif on the ReadMe is sort of intentionally slowed down / it was recorded from a higher difficulty setting than the one I use on my site. I think you would have to pull out a cell phone from 8-10 years ago to see it go that slow on my site. On a new computer its so fast that it can barely even render the progress bar before its done.

fluffy-critter commented 11 months ago

Yeah there's no way to automatically get rid of all spam, but it's still nice to have a means of being able to classify things to apply different moderation policies to them, and possibly be able to specifically whitelist known-good posters so their comments go up immediately.

isso-comments / isso

spam filtering #11