Trust of ML models - Githubissues

stealthpaladin commented 1 year ago

Describe the bug With the way these model classes are designed, it is highly likely that any of the major shifts in policy toward in the following will be resisted by said models:

"Shadow" Banning
Topic Censorship
Political Bias
Social Controversies
Similar

To Reproduce Innate to every feed of Tweets

Expected behavior Models only filter and sort according to known metrics, providing feedback on labeled data used in the process

Additional context Currently it is obvious that output labeling has been limited to avoid culpability, rather than bolstered for maximum transparency. That or the techniques required to gain insight during inference are not known to Twitter, which seems unlikely.

For example, the labeling of "abusive_model.py" is very poorly implemented, reducing qualitative checks to only quantitative rankings rather than harvesting more descript qualities from the inference process, then paired with the rankings.

stealthpaladin commented 1 year ago

Every model should be trained on publicly available, heavily audited and standardized datasets, allowing anyone to:

grab their graph(s) via API to verify the live result and simulated result are identical
test for false positives and false negatives given arbitrary sample data unknown to the model
pretest whether content would be ranked higher/lower on private hardware before publicly posting

Further, even without the training dataset, it is obvious there is a stark deficiency of output labels. These should inform the application what source material was drawn on from the training datasets to make each and every decision, along with a more robust profile of the decisions themselves. The process is far more opaque than it should be.

GabenGar commented 1 year ago

By "publicly available, heavily audited and standardized datasets" did you mean hosted on US-controlled, Microsoft-owned hosting service github.com and audited by federal agents employed by countries in 14 eyes network? Last time I checked twitter was more than happy to oblige with requests for bans/removals initiated by the governments (it might or might not ignore requests by countries sanctioned by US) and in this multi-country environment it's impossible to have a non-opaque system as the chance of the dataset including a state secret of another country or data deemed illegal in one of the countries approaches 100% with more countries in the system.

stealthpaladin commented 1 year ago

By "publicly available, heavily audited and standardized datasets" did you mean hosted on US-controlled, Microsoft-owned hosting service github.com and audited by federal agents employed by countries in 14 eyes network? Last time I checked twitter was more than happy to oblige with requests for bans/removals initiated by the governments (it might or might not ignore requests by countries sanctioned by US) and in this multi-country environment it's impossible to have a non-opaque system as the chance of the dataset including a state secret of another country or data deemed illegal in one of the countries approaches 100% with more countries in the system.

Thanks for the feedback.

What I mean is that we should all have access to the sample terms being trained for filters and classifiers. This is because we need to be able to retrain a model and get the same results. Given the model handling code at present, it facilitates a situation where training prior to change of ownership (or training done outside of oversight) will have a major impact and not be detectable by community developers.

For example it is trivial to make a model hyper-sensitive to banning political thought which is against narrative X. If we cannot audit this better there is quite literally zero point in this repo existing.

To the point intel communities etc, Twitter has committed to limiting their censorship policies. Thus what is deemed to be "toxic", "abusive" and/or "nsfw" by each associated classifier model in this repo, should only be trained on known samples. There is no special information provided to onlookers other than clarity on what Twitter considers to match these qualities. Even in the case of various jurisdictions, this can be handled by Git branches.

Note; that means I am also suggesting the samples themselves be limited rather than training on all Tweets, perhaps not even using real-life Tweets, as no consensus could ever be reached that fully classified the network accurate across several qualities; and to do so would cause it to become overly strict in feed mixing.

EDIT: I'ld also just reiterate; separately than the considerations on dataset, labeling output to reference the chain-of-thought can be designed to make things much more transparent and testable. If robust enough it would still enable developers to run tests on local deployments against the API to validate certain current events are/are not being blocked etc.

And a final point on the dataset, we should expect Twitter to do this externally, since like I'm pointing out the algorithm is entirely based on these models. So showing us "oh hey we're doing inference", I mean, ya no duh -- we need to know what you are inferring! But even were they closed source about it, we should see the evidence in the code that they have these practices in place for internal auditing -- so Twitter can prove to itself that old training or bad actors are not toying with the model.

kapilepatel commented 1 year ago

Every model should be trained on publicly available, heavily audited and standardized datasets, allowing anyone to:

grab their graph(s) via API to verify the live result and simulated result are identical

test for false positives and false negatives given arbitrary sample data unknown to the model

pretest whether content would be ranked higher/lower on private hardware before publicly posting

Further, even without the training dataset, it is obvious there is a stark deficiency of output labels. These should inform the application what source material was drawn on from the training datasets to make each and every decision, along with a more robust profile of the decisions themselves. The process is far more opaque than it should be.

Agree

twitter / the-algorithm

Trust of ML models #1342