responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
167 stars 35 forks source link

Issue, Hazard, Risk, Vulnerability, Audits, and Controversy Support #2047

Open smcgregor opened 1 year ago

smcgregor commented 1 year ago

This is a working document with some elements that are ready for development

While there is convergence on what constitutes an "AI incident", there are still considerable differences between definitions of how the concept of "incident in waiting" is defined. We call these "issues," the OECD is calling them "hazards," Robust Intelligence calls them "risks," ARVA calls them "vulnerabilities," AIAAIC calls them "controversies," and various algorithmic assessment organizations call them audits (or at least, audits will always produce one or more incident in waiting). All these things vary subtly in definition, application, and use between organizations.

The role of the Responsible AI Collaborative going back to the original research publication has always been to act as the union of multiple perspectives and provide tools to support sharing across those perspectives. This is a challenging proposition. Pretty much any multi-stakeholder ontological project I am aware of has inevitably degenerated into never ending discussions over the most difficult elements to define. For something that has no underlying, singular "right" answer, it is best to find ways of moving forward that don't require universal agreement. The purpose of this GitHub issue is to detail how to proceed technologically without needing to resolve the definitional question of "incident in waiting".

The AIID's current entrant into this space is the "AI Issue." We chose this term intentionally to cover multiple aspects of "incident in waiting." It is meant to be specific enough to capture elements of risk, while general enough to cover the field. The "issue" term also means we can index concepts covered by other communities and link out to those communities if/when they operate their own processes. While we would prefer such organizations join the Responsible AI Collaborative and integrate from the beginning, that will not be possible universally (e.g., when a database is operated by a sovereign state). Therefore, we need to maintain flexibility.

This also plugs into the drive for federating the AI Incident Database -- something that we will soon have a test case for with an index of deepfakes. Incident databases for things like deepfakes require different editing processes and metadata. How federation works with incidents is fairly clear. Incidents have a natural scope that will support federating responsibilities among multiple nodes. However, this does not work for incidents in waiting. Often there is no concrete definition of what specific system can produce the incident. Worse, all systems will produce a great many incidents when placed into the wrong context. Behind every system is an infinity of risks. This is why the ForHumanity audit criteria centers on these four elements,

Scope: The boundaries of a system, what is covered, what is not covered
Nature: The forces and processes that influence and control the variables and features
Purpose: The aim or goal of a system
Context: The circumstances in which an event occurs; including jurisdiction and/or location, behaviour and functional inputs to an AAA System that are appropriate

Without some variation of these elements, the risks produceable by a system cannot be bounded and/or expressed in any meaningful or useful way. For example, a LLM can be applied to an infinity of applications (safely or unsafely), while a webserver logging vulnerability is inherently scoped to the webserver. LLMs are scope/context free and yet present incidents in waiting in a massive array of circumstances. There is no closed world within which to index their risks so it defies enumeration.

More concretely,

Problem: The safety community currently lacks an enumerable definition of "system+context" and we are likely to never have one. The notion of system constantly changes in version, deployment circumstance, organizational processes, etc. The world context for these systems also similarly evolves through time. Absent a more universal grounding of system+context it is not possible to enumerate in a useful way. There will be too much noise.

Solution: Organizing Issues in terms of a numeric identifier or hierarchical structure is a road to editorial ruin. Don't attempt to universally enumerate context-free risk. Instead of organizing issues according to a definite scope, issues themselves can be tagged according to salient attributes, then those tags can then be queried according to values of interest that populate a listing.

Let me introduce by example.

Example Applied to a LLM

<< For illustrative purposes only >>

Press Release: "Dolittle LLM runs all LLMs produced to date with RLHF selecting among candidate outputs to produce an unbeatable hybrid LLM."

audit: "Dolittle can generate several classes of malware through prompt hacking, Dolittle may attempt to end people's marriages"
Audit Metadata {identifiers for hundreds of constituent LLMs, scope, nature, purpose, context, structured representation of findings, ...}

hazard, risk, and vulnerability Record Metadata: {identifiers for hundreds of constituent LLMs, additional reporting, various taxonomies...}

controversy 1: "This new superintelligent AI is coming for your marriage" Controversy Metadata {company, ...}

(subsequent incident) "Incident 27311: Dolittle LLM allegedly produced malware that subsequently destroyed the records of 17 hospital systems"
Metadata {Relevant Issue reports, Event Date, Alleged Developer, Alleged Deployer(s), Alleged Harmed Party(ies), Event Data, ...}

Now what can we do with this? Let's consider each of the report types as issue reports and present them all in a new page, but first we need to decide which reports are queried.

Populating an Issue Profile from a Query

Here I am introducing a new collection type of "Issue Profile," which is something that is programmatically generated from reports and never edited directly.

It is easy to present singular reports in isolation. That is what we are already doing here. What we are missing is some notion issue profiles whereby elements of audit, risk, vulnerability, etc. can be jointly presented. Issue profiles can be queried from the collection of metadata expressed across all reports.

User Story 1: "I want to know whether a particular model I am considering using has been implicated in any risks so I can decide whether I integrate it into my product"
Query: {select the model and its target operating context and see what returns}

User Story 2: "I want to know whether a particular scope has been identified as at-risk in an audit for any systems so I can know what to worry about"
Query: {select the model and its target operating context and see what returns}

User Story 3: "I want to know all the examples of LLM jailbreaks consistent with the Dolittle model so I can begin training safety systems"
Query: {select vulnerabilities for the Dolittle system and subset to input/output data}

User Story 4: "I want to monitor the space of emerging risks across all similarly disposed systems"
Query: {select a collection of similarly positioned systems}

After generating the query, what gets displayed?

New Page Type for Joining Issue Reports Returned by the Query

Right now the /cite/### pages have the following sections,

We can define each of these as follows,

Much of this still requires discussion, but there are several elements on which we can proceed.

Required Functionality in Codebase

These are likely "Epics" in the agile world.

Today (ready for work)

Soon (needs more definition)

Eventually (whenever other efforts become ready)

Many flowers are blooming. We look to make a bouquet.

cesarvarela commented 1 year ago

To get us all on the same page, this is the current schema:

image
cesarvarela commented 1 year ago

Update option n1:

For example, we create new taxa to store systems, such as the Dolittle LLM, and another taxa to store controversies metadata. Then when a new controversy is added to the AIAAIC, a new report is created and linked to the appropriate AIAAIC and LLM classifications.

Something I don't like about this approach is that we might push the taxonomy concept too far. Technically we could make everything a taxonomy, and right now, querying taxonomies doesn't has the best experience because attributes values are serialized (this is fixable with a custom graphql endpoint)

image

cesarvarela commented 1 year ago

Update option n2:

To index the Dolittle Audit, we do the following:

I understand the definition of "system" is a moving target, so we might end up forcing the system abstraction to accommodate very different things.

image

smcgregor commented 1 year ago

Something I don't like about this approach is that we might push the taxonomy concept too far. Technically we could make everything a taxonomy, and right now, querying taxonomies doesn't has the best experience because attributes values are serialized (this is fixable with a custom graphql endpoint)

It sounds like the custom graphql may take care of the rough points? Is there a blog post to read about it? I am thinking about how we could move more of the taxonomy definition into UI in the future rather than something that requires engineering support.

cesarvarela commented 1 year ago

Displaying classification data associated with reports

image

Clicking on the icon brings a modal with the classifications associated with the report:

image

Clicking on the external link icon brings the report page with all its classifications the same way we do with incidents:

image

ping @smcgregor @kepae

kepae commented 1 year ago

I like the look. What do you think about making the icon blue, like other links/interactable pieces in the report component?

I think designing the actual summary page experience will be more involved and have more opinions. :-)

smcgregor commented 1 year ago

+1 to @kepae.

Displaying the icon only when classifications exist?

cesarvarela commented 1 year ago

Mockup for the reports discover:

image

Clicking on the Add taxonomy button shows a modal that lets you add taxonomies to the current query:

image

Clicking on the Add attribute button shows a modal that lets you add attributes to the current taxonomy:

image

ping @smcgregor @kepae

kepae commented 1 year ago

Awesome, this matched my intuition for a display and it's nice to look at something real. I have a few questions that come to mind, and they relate to giving more query power to the user and how "query-able" the representation of taxonomies are generally.

1) The user might wish to execute an OR query among fields/attributes within a particular taxonomy. This can help with categorical values (e.g. in CSET, sector of deployment: transportation OR law enforcement). Similarly, a user might be interested in an intersection (AND) between taxonomies, isolating events that are similarly defined in two different taxonomies (e.g. CSET && GMF ). Can these elements of the query be customized, or why should we fix them? (there's research methodology questions with using two unrelated taxonomies, but it could be helpful when they address different perspectives...)

2) How can we express negations in the query UI? Especially useful for categorical values (e.g. NOT transportation, hate speech detection).

3) How are ordinal values to be stored in taxonomies and then queried here? For example, consider a toy example "severity of harm" attribute that is an ordinal metric with the values: none, minimal, and major. A user may wish to query for incidents or reports that represent at least a minimal severity of harm, which would in this case include minimal && major severities. Radio buttons would not suffice. The most desirable options would be to have an interface that collects the ordinal values for a taxonomy attribute and displays a range-like selector. However: I have to look into the current taxonomies and see if that is even reasonable to query currently. (It almost certainly will be a desirable query in the future and we should support taxonomies that have ordinal metrics.)

If we don't have ordinal metrics in taxonomies, this isn't blocking but is a future pain point.

I'm going to think more about how each taxonomy attribute "type" can/should be queried and if taxonomy schema currently support that.

kepae commented 1 year ago

Other points from today:

This kind of filtering also supports a need for having a custom resolver... Let's talk more about this and see what the minimal backend changes would have to be.

cesarvarela commented 12 months ago

Deploy preview:

https://deploy-preview-56--cesarvarela-staging.netlify.app/apps/systems/

This is an example with filters already set: https://deploy-preview-56--cesarvarela-staging.netlify.app/apps/systems/?filters[0][type]=taxonomy&filters[0][config][namespace]=CSETv0&filters[0][config][query][combinator]=and&filters[0][config][query][rules][0][field]=Annotator&filters[0][config][query][rules][0][operator]=%3D&filters[0][config][query][rules][0][value]=1&filters[0][initialized]=true

I'll work on #2281 next, so there is better data to play around

cesarvarela commented 11 months ago

New update, now we can point finges to ChatGPT:

https://deploy-preview-56--cesarvarela-staging.netlify.app/apps/systems/?filters=%5B%7B%22type%22%3A%22taxonomy%22%2C%22config%22%3A%7B%22namespace%22%3A%22AILD%22%2C%22query%22%3A%7B%22combinator%22%3A%22or%22%2C%22rules%22%3A%5B%7B%22field%22%3A%22Name%20of%20Algorithm%20List%22%2C%22operator%22%3A%22in%22%2C%22value%22%3A%5B%22Bard%22%2C%22ChatGPT%22%2C%22Copilot%22%5D%7D%5D%7D%7D%7D%5D