twitter / the-algorithm

Source code for Twitter's Recommendation Algorithm
https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm
GNU Affero General Public License v3.0
62.32k stars 12.15k forks source link

Recommendation Algorithm Manipulation via mass blocks #1386

Open redknightlois opened 1 year ago

redknightlois commented 1 year ago

The current implementation allows for coordinated hurting of account reputation without recourse. The most general behavior is that global penalties are prone to be gamed (all of them). In other time I would just report this information using a vulnerability channel, but given that this is already popular knowledge there is no use to do so.

The reason is that there is nothing a user can do to get rid of it because:

To Reproduce Organize a botnet or a group of people with known similar views. Request your followers to block someone for 'reasons' (it doesn't matter here if the reasons are valid or not). This is exploited by political parties, group-think, etc. Now that this is also known, the vulnerability is plain obvious.

Examples (using them to show the behavior does exist, not to punish the users for anything I had a lot to choose from):

https://twitter.com/BlockTheBlue https://twitter.com/ayybeary/status/1642280442047995906 https://twitter.com/Kaptain_Kobold/status/1642379706925477888 https://twitter.com/MAYBEEELI/status/1642300879649792004 https://twitter.com/glenda_aus/status/1642282010462007296

There are apps that allow you build/organize/weaponize this behavior.

While already shutdown, these are some of the stats for BlockTogether:

Steps to reproduce the behavior:

  1. Organize a group with a few friends (I have groups with 40+)
  2. Find a target, and execute the following tasks in order
  3. They should follow in preparation, a few days later unfollow first, [just doing this in 90 days intervals also hurts]
  4. Then they will report a few "borderline" posts.
  5. Then they will mute.
  6. Then they will block.

Expected behavior No global penalty should be applied because you can game them pretty easily, all penalties (if any) should be applied at the content level.

jbauernberger commented 1 year ago

Someone did a CVE already. What a time to be alive. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-29218

chaoscode commented 1 year ago

So we are opening CVE's for cancel culture now?

YellowAfterlife commented 1 year ago

So we are opening CVE's for cancel culture now?

A little different - currently the algorithm weights (by how much?) discovery based on the number of times a user has been blocked and seemingly without an "expiration" (unlike unfollows), meaning that

  1. Users may permanent tank an account's visibility by muting/blocking it en masse, which is particularly interesting since the target might never notice (especially with mutes).
    [issue text and CVE describe this]
  2. Unless you only post the most lukewarm content¹, your visibility may reduce "organically" over the years.
    You've told someone that they have their facts wrong in 2011 and they muted/blocked in you in response? This will cost you, even a decade later.

¹ Though even then, you might mute an account simply because Twitter suggests it to you too often in the algorithmic feed. Have I unknowingly contributed to downfalls of several "funny animal pictures" accounts? Oh no

I can think of problems that the current implementation solves, but if it is exactly what the code looks like, it's probably going to be abused to hell by interested parties.

Relevant code: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b928b479b512ec51ac2c3821f5922/src/scala/com/twitter/interaction_graph/scio/agg_negative/InteractionGraphNegativeJob.scala#L52-L143

jamesdigid commented 1 year ago

The muting aspect should be revised if not removed altogether as part weighting aspect. I agree with the op we should be applying penalties at the content level let people construct their artificial echo chambers.

I suppose the spirit of this is getting ahead of harmful content algorithmically by leveraging 'muting' and 'blocks' as a general signal. I think the ethical thing to do here would be to implement some type of time entropy with helpful feedback to the user i get the idea of wanting to get ahead of content people dont want to see but not having the ability for any redemption forever is a major flaw.

What could be done is to de-rank post for people that are in the segment of people who blocked/muted the account originally so as to not 're-offend' the community segment that originally blocked/muted.

There really should be an algorithm for 'intelligent community clustering.' I haven't finished reading all the code yet so perhaps there is.

igorbrigadir commented 1 year ago

There really should be an algorithm for 'intelligent community clustering.' I haven't finished reading all the code yet so perhaps there is.

This is exactly the goal of SimClusters btw, but Mutes and Blocks are used for this: https://github.com/twitter/the-algorithm/tree/main/src/scala/com/twitter/simclusters_v2

jamesdigid commented 1 year ago

There really should be an algorithm for 'intelligent community clustering.' I haven't finished reading all the code yet so perhaps there is.

This is exactly the goal of SimClusters btw, but Mutes and Blocks are used for this: https://github.com/twitter/the-algorithm/tree/main/src/scala/com/twitter/simclusters_v2

Perfect, is there a higher abstraction that leverages negative signals to the community space? Such as something like a gamification abstraction layer or even insights into 'community collision' signals. For example, if a mass block/muting event is the result of two community spaces interacting, or a coordinated attack? I haven't found this logic yet.

I'm just curious surely twitter has deeper insights as to why mass block/mute events occurring and how to differentiate between an artificial coordinated attack, and community collisions.

redknightlois commented 1 year ago

From my experience in reinforcement learning, I have come to the realization that negative signals, especially of the global type, are specially tricky to get right. The reason for this is that algorithms are quick to identify how to game negative feedback, leading to a rapid convergence into local minima. This pattern is evident when you add up all the negative feedback, and your reputation starts declining towards zero.

I have observed that negative feedback can be easily exploited by an adversary to push the system into such a state in a much faster way. As mentioned by @YellowAfterlife, interestingly in the infinite, everyone stabilizes around a zero reputation score. And given that now the source code is visible to everyone it puts a much bigger price on errors arising from those negative signals.

. I think the ethical thing to do here would be to implement some type of time entropy with helpful feedback to the user i get the idea of wanting to get ahead of content people dont want to see but not having the ability for any redemption forever is a major flaw.

@jamesdigid it is not so easy. Even time entropy can be gamed. Lets assume for simplicity that we use the 90 days entropy decay that is been currently applied for unfollows.

image

Essentially, you can generate follows and unfollows in 90-day intervals and keep the user in a deboosted state forever on a finite resource (bot accounts). Even though this tactic cannot be used on small accounts as the behavior would be visible and raise suspicion. However, on larger accounts, this can be done with the help of a botnet-type attack.

Here's how it works:

On day 1, you follow the account with 1000 other accounts. On day 2, you follow the account with another 1000 accounts. On day 90, you follow the account again with another 1000 accounts. On day 91, you unfollow the account with the first 1000 accounts. On day 92, you unfollow the account with the first 1000 accounts and follow it again with another 1000 accounts.

The only user who sees this behavior is the account owner who notices X number of follows and Y number of unfollows. However, follows do not affect the account's reputation, but unfollows do.

This tactic can be repeated indefinitely, resulting in the account suffering the reputation hit of 90,000 extra unfollows. In this case, since people seldom unfollow others, this is a massive signal unless you weight it down to oblivion. I still believe the proper way of handling this is using only positive signals to achieve population segmentation by content.

jamesdigid commented 1 year ago

Right, so timed coordinated attack are almost exclusively conducted by a botnet as opposed to something like a "call-to-arms" type attack conducted by social influence. My question is there should be signals capturing this somewhere which i'm not finding. There ought to be efforts to spotlight bad faith actors carrying out timed attacks against people of influence.

The the timed attack you mentioned would have a host of unnatural behaviors indicating gamification is occurring.

I'm certain this type of abstraction i'm talking about exist in twitters codebase somewhere i've tried searching some keywords without success.

jamesdigid commented 1 year ago

This issue seems related to https://github.com/twitter/the-algorithm/issues/127

if not related atleast connected in the sense of another of way of mitigating mass block/mute attacks

redknightlois commented 1 year ago

Thats correct it is similar in nature though the solution does not solve the problem. If you dont post, you dont care about the deboosting, bad actors exist (a fact of life) and will use that fact if given the chance to use it for their purpose. Also the persistent threat will take advantage of the economy of scale (because it can), so the solution either make it economically impossible and/or limit the impact it can have. Limiting blocks does nothing, with 10000 users you can generate 1000000 blocks if only given 100 blocks per year. If a single actor (botnet) can do as many for cheap, you need to at least adjust the weight by a similar margin ( multiply by 0.000001). With LLMs good luck figuring out if I create 10000 users that post 1 to 5 times a day on whatever topic and even engage with people. It would probably make sense to just do the experiment to show how viable the attack is.

On Thu, Apr 6, 2023, 10:34 AM DigiDivinity @.***> wrote:

This issue seems related to #127 https://github.com/twitter/the-algorithm/issues/127

— Reply to this email directly, view it on GitHub https://github.com/twitter/the-algorithm/issues/1386#issuecomment-1499074155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH7HI5JGVYQV5OFZBMCTNLW73A7BANCNFSM6AAAAAAWQCXLHM . You are receiving this because you authored the thread.Message ID: @.***>

jamesdigid commented 1 year ago

So the crux of this problem is spotlighting botnet activity. I thought the notion of the blue check mark was to mitigate that risk by pushing the cost of botnets beyond the rewards?

PaulNewton commented 1 year ago

Related pull https://github.com/twitter/the-algorithm/pull/660 Limit penalization on blocks / mutes for a cooldown of 180 days For issue #658 Excessively penalizing accounts for being blocked or reported