GMF Taxonomy - Githubissues

smcgregor commented 2 years ago

(This is an issue intended to engender discussion)

One of the projects we would like to pursue is the development of a technical failure taxonomy that indicates the technical factors at play in producing an AI incident. Technical factors are challenging to determine since most incidents give only high-level information about what happened. Details about the technology and why the incident happened is typically lacking. Still, some incidents do provide more information, and those lacking concrete technical information are often amenable to technical analysis derived from first principles on how such systems are typically implemented and can fail. Thus if we establish a hierarchical taxonomy, in which the classifications are marked as speculative or grounded in evidence, then we can produce a system whereby we state (1; see below) with factual grounding for most incidents, while (2) and (3) would be increasingly speculative,

The problem area (AKA, the application, task or the domain)
The technology applied to solve the problem
The technical factors producing the incident

The nice thing about this is that there is a strong probabilistic relationship between each of these hierarchical levels.
--> The set of potential technologies involved in an incident is limited to those technologies that may be applied to a task.
-->Similarly, technologies can fail in specific ways when applied to a task.

Thus, while it is proving difficult to develop a taxonomy of AI failures on the basis of just news reports (we have tried...), we can likely annotate (1) and (2), then leverage this context to generate a set of potential technical failures. We won't know what definitely happened, but we can develop the set of potential technical factors and share this context across incidents having the same applications and technologies.

I am proposing this project to be developed in the following segments, to be tracked across a set of new incidents to be created and linked below,

[ ] Adopt or develop an AI task taxonomy. This will necessarily be an open coding set, since new tasks for AI systems are proposed fairly often.
[ ] Apply the task taxonomy to the current AIID data (spreadsheet prototype)
[ ] Adopt or develop an AI technology taxonomy (e.g., transformer-based neural network, random forest, etc.)
[ ] Apply the task taxonomy to the current AIID data (spreadsheet prototype)
[ ] Generate hypotheses for how the technology (or the human systems surrounding it) could have produced the incident when applied to the task
[ ] Adopt or develop a technical failure taxonomy coding the hypotheses into categories
[ ] Apply the technical failure taxonomy (spreadsheet prototype)
[ ] Fully document the taxonomy to ensure the greatest degree of qualitative consistency (inter-rater agreement) as can be achieved in this setting
[ ] Migrate the taxonomy to the AIID production user interface
[ ] Build a taxonomy proposal system whereby the general public can suggest changes to the classifications applied to the incidents
[ ] Build an editor review and approval system for the suggested classifications

smcgregor commented 2 years ago

Let's test run any taxonomies produced on the following incidents, which I selected to produce a variety of tasks, technologies, and factors. All are heavily reported, but not necessarily at a depth that would allow for greatly reducing the uncertainty.

YouTube Kids (Recommendation)

https://incidentdatabase.ai/cite/1

Example potential classifications

Likely AI Tasks: recommendation, transcription (for profanity-based filtering), nudity detection
Potential AI Technologies: Collaborative filtering, Vector space embeddings,
Speculative Contributing Factors: Adversarial submission of user data,

npit commented 2 years ago

I have begun work on developing the three taxonomies described above. An indicative snapshot of the current (malleable) status is:

22 annotated incidents
(Min, max, mean) annotation counts per taxonomy for these incidents:
- System Tasks: (1, 4, 1.36)
- Methods and Technologies: (1, 6, 2.54)
- Technical Failures: (1, 8, 3.27)
32 / 29 / 31 taxonomy nodes for the Tasks / Technologies / Failures taxonomy hierarchies, respectively

Indicative next steps include:

Annotate ~20 more incidents
Utilize selected related work to refactor taxonomy hierarchies
Refactor the documentation of the taxonomies
Work on tools to speed-up / automate the annotation & taxonomy development process

I will share additional info (documentation, tools, repositories) shortly.

smcgregor commented 2 years ago

The full report on this: https://arxiv.org/abs/2211.07280

kepae commented 9 months ago

Closing this historical tracking issue and linking the improvements that we will prioritize per risk checklisting.

2302
2496

responsible-ai-collaborative / aiid

GMF Taxonomy #844

YouTube Kids (Recommendation)

Facebook Translate (NLP/Translation)

ShotSpotter (perception)

Chess Robot

2302

2496