[Feature Request] "bot-infested" instance detection

sunaurus commented 1 year ago

There is currently a huge amount of bots signing up on smaller instances, especially instances with no captcha + no e-mail verification.

It would be quite useful to be able to detect and filter for such instances on lemmyverse.net, perhaps by checking for discrepancies between user counts and post counts? Or maybe by checking for instances with massive user growths but without a similar growth in post count?

sunaurus commented 1 year ago

Post about the current situation on lemmy.ml: https://lemmy.ml/post/1391903

tgxn commented 1 year ago

Hmm, I understand the basic idea here, and I actually saw it over on Lemmy first :D But I'm not convinced it's something I should be patching.

Each instance is in charge of it's own rules, configuration and user base. I've seen lots of sites grow a large amount of the past few weeks, this could be partially due to Reddit exodus, so I'm not making any assumptions. I've not seen any spam lately, maybe I'm just lucky. 🤷‍♂️ (I'm now tracking site->user over time, so in future I'll add a "user growth" metric somewhere.)

I had a look through your spreadsheet, and I can see there's few valid sites - voyager.lemmy.ml to name one, while there's others that are definitely sketchy such as Podycust. Hexbear is one which I noticed wasn't on your list (they are running invalid federation config).

I'm planning a "trust score" feature to calculate a score for each instance to assist in sorting/ranking them, which will include things like user count, server version, user growth, post growth, etc. The scoring algo is already why you don't see Lemmygrad and lemmynsfw on the front page

Instances

Sort By Users:

Sort by "Smart Sort" (which already looks as incoming and outgoing federation blocks for each instance)

Communities

Sort by Comments:

Sort by "Smart Sort":

At the end of the day, how would I apply moderation? Ban all communities that have inflated user growth? Remove all communities >5000 users?

sunaurus commented 1 year ago

A trust score is definitely super useful 👍

I would propose setting a trust score threshold for automatically hiding instances from all sorting methods, with maybe a by-default-checked checkbox somewhere like "hide instances with low trust score".

I realize that it has the danger of false positives, but I think a few false positives is the lesser evil compared to not filtering these out at all. Currently, the result of prominently showing these instances at the top of user count sort is quite bad: many Lemmy newbies will always sort by user count, and end up in one of these instances, totally disappointed.

Random example I saw:

tgxn commented 1 year ago

You're not wrong. There are a lot of instances mis-reporting user numbers.

         "users": {
                "total": 8970,
                "activeHalfyear": 0,
                "activeMonth": 0
            },

{
                "total": 17438,
                "activeHalfyear": 69,
                "activeMonth": 69
            },

I'm sure these are not real. Or it's possible lots of fake users signing up?

Anyway I implemented some code to detect based on user activity, inspired by https://github.com/db0/lemmy-overseer/blob/main/overseer/observer.py#L56

it's looking a bit better but i hope there's no valid ones removed. I added a menu option to unhide them too.

tgxn commented 1 year ago

Relating this to https://github.com/LemmyNet/lemmy/issues/2355 as these are more than likely the result of spambots creating accounts on instances that don't have verification enabled.

tgxn / lemmy-explorer

[Feature Request] "bot-infested" instance detection #64

Instances

Communities