responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
167 stars 34 forks source link

Implement additional definitional taxonomy/tags to apply to records (e.g. Audit, Hazard, Vulnerability, Controversy, Assessment) #2292

Open kepae opened 10 months ago

kepae commented 10 months ago

This issue is to discuss technical implementations for a task envisioned in #2047.

Adopt definitions for Audit, Hazard, Vulnerability, Risk, Assessment, and Controversy, then index all records from databases disposed towards interoperability meeting those definitions. Databases with useful but non-conforming definitions can also be indexed, but under an "other records" heading.

The primary goal of the issue is to advance the query and discovery interfaces to "capture" an expanding number of complementary and/or competing terms used to describe the records of harm or near-harm events implicating AI and intelligent systems without getting trapped by, stuck in, or dismissing the situational import of the different terms. Consult https://github.com/responsible-ai-collaborative/aiid/issues/2047#issue-1728355411 for further context.

Choosing and maintaining a set of external definitions is something to be discussed organizationally, but a technical implementation can be planned.

Option 1: New taxonomy of external definitions?

Perhaps the best option to flexibly support external definitions (without losing the platform emphasis on incidents) is to create a new AIID-maintained taxonomy that applies a strict set of high-level enum definitions to report and incident-level objects. This is possible now that classification objects can also point to report records. (This is in contrast to creating one new low-level detailed taxonomy per each external definition, which has its own merits but requires more input from a partner institution if desired.)

Such a classification can be read as: "Report/Incident N is/represents a/indicates an event X." (Perhaps as defined by Entity E.)

Ideally, for ease of maintaining and reading the classifications, there would be one classification record that contains all possible N tag definitions for exactly one report, rather than N classifications applying to 1 report. The taxonomy page could details the set of definitions adopted and cite sources in more details.

This approach is easier to maintain than creating a single detailed taxonomy for each possible supplemental definition, and creating one classification under each taxonomy. We can't yet commit to maintaining N detailed taxonomies -- something ideally done with domain partners -- but we can potentially commit to high-level tagging.

Option 2: Or, just use tags fields in report objects.

It could just be more effective and maintainable to add the tags directly to the report records. However, these tags don't exist on incidents, and classifications can apply to both objects (and express more metadata).

@cesarvarela -- in addition to ingesting datasets that introduce other types, having this data on the platform would make the taxonomy-level querying more helpful for the goal of #2047 . While the data goal is sounds, the query interface could still be re-visited -- especially if it means improving the Discover application rather than making a distinct interface.

Current approach

Let's go with Option 1, with the mentality that the AIID is adopting and maintaining a set of supplementary, high-level definitions that describe incident and issue reports. In that sense, they will no longer be purely "external."

Aside from the stakeholder process of adopting such supplementary terms to characterize reports, we should also consider how to make this data easier to add and maintain for editors.

kepae commented 4 months ago

I did some archaeology and found https://github.com/responsible-ai-collaborative/aiid/issues/403, which implies that the tag field was originally intended to categorize types of reports directly. It doesn't deal with the substance of the report (e.g. alleged vulnerability, hazard, issue...) but does handle something of the type of report: journalistic, academic, response, audit, etc.

If we can find a structured way to maintain a list of allowable tags, we should consider just going with tags.

We can still use a taxonomy approach for handling substance and external definitions, but using tags might better serve report types for now. (and hopefully a migration/mapping of tag values to anything more generalizable in the future wouldn't be hard.)