Open smcgregor opened 1 year ago
To get us all on the same page, this is the current schema:
Update option n1:
For example, we create new taxa to store systems, such as the Dolittle LLM, and another taxa to store controversies metadata. Then when a new controversy is added to the AIAAIC, a new report is created and linked to the appropriate AIAAIC and LLM classifications.
Something I don't like about this approach is that we might push the taxonomy concept too far. Technically we could make everything a taxonomy, and right now, querying taxonomies doesn't has the best experience because attributes values are serialized (this is fixable with a custom graphql endpoint)
Update option n2:
systems
collection to store stuff like the "Dolittle LLM" and allow them to be linked to one or multiple reportsTo index the Dolittle Audit, we do the following:
I understand the definition of "system" is a moving target, so we might end up forcing the system abstraction to accommodate very different things.
Something I don't like about this approach is that we might push the taxonomy concept too far. Technically we could make everything a taxonomy, and right now, querying taxonomies doesn't has the best experience because attributes values are serialized (this is fixable with a custom graphql endpoint)
It sounds like the custom graphql may take care of the rough points? Is there a blog post to read about it? I am thinking about how we could move more of the taxonomy definition into UI in the future rather than something that requires engineering support.
Clicking on the icon brings a modal with the classifications associated with the report:
Clicking on the external link icon brings the report page with all its classifications the same way we do with incidents:
ping @smcgregor @kepae
I like the look. What do you think about making the icon blue, like other links/interactable pieces in the report component?
I think designing the actual summary page experience will be more involved and have more opinions. :-)
+1 to @kepae.
Displaying the icon only when classifications exist?
Mockup for the reports discover:
OR
of the items that match what is set for each taxonomy component and an AND
for each attribute added inside each taxonomy componentClicking on the Add taxonomy button shows a modal that lets you add taxonomies to the current query:
Clicking on the Add attribute button shows a modal that lets you add attributes to the current taxonomy:
ping @smcgregor @kepae
Awesome, this matched my intuition for a display and it's nice to look at something real. I have a few questions that come to mind, and they relate to giving more query power to the user and how "query-able" the representation of taxonomies are generally.
1) The user might wish to execute an OR
query among fields/attributes within a particular taxonomy. This can help with categorical values (e.g. in CSET, sector of deployment: transportation OR
law enforcement). Similarly, a user might be interested in an intersection (AND
) between taxonomies, isolating events that are similarly defined in two different taxonomies (e.g. CSET
2) How can we express negations in the query UI? Especially useful for categorical values (e.g. NOT
transportation, hate speech detection).
3) How are ordinal values to be stored in taxonomies and then queried here?
For example, consider a toy example "severity of harm" attribute that is an ordinal metric with the values: none
, minimal
, and major
. A user may wish to query for incidents or reports that represent at least a minimal severity of harm, which would in this case include minimal
&& major severities
.
Radio buttons would not suffice. The most desirable options would be to have an interface that collects the ordinal values for a taxonomy attribute and displays a range-like selector. However: I have to look into the current taxonomies and see if that is even reasonable to query currently. (It almost certainly will be a desirable query in the future and we should support taxonomies that have ordinal metrics.)
If we don't have ordinal metrics in taxonomies, this isn't blocking but is a future pain point.
I'm going to think more about how each taxonomy attribute "type" can/should be queried and if taxonomy schema currently support that.
Other points from today:
the query section should minimize easily to put focus on the results without scrolling
the goal is to surface incidents and separate issue reports to start, not reports that are underlying the same incidents. So the query should by default (maybe explicitly give control to the user) about filter reports that are used to substantiate incidents already present in the results. This should not filter reports that re used to substantiate other incidents not returned in the query.
This kind of filtering also supports a need for having a custom resolver... Let's talk more about this and see what the minimal backend changes would have to be.
Deploy preview:
https://deploy-preview-56--cesarvarela-staging.netlify.app/apps/systems/
This is an example with filters already set: https://deploy-preview-56--cesarvarela-staging.netlify.app/apps/systems/?filters[0][type]=taxonomy&filters[0][config][namespace]=CSETv0&filters[0][config][query][combinator]=and&filters[0][config][query][rules][0][field]=Annotator&filters[0][config][query][rules][0][operator]=%3D&filters[0][config][query][rules][0][value]=1&filters[0][initialized]=true
I'll work on #2281 next, so there is better data to play around
New update, now we can point finges to ChatGPT:
This is a working document with some elements that are ready for development
While there is convergence on what constitutes an "AI incident", there are still considerable differences between definitions of how the concept of "incident in waiting" is defined. We call these "issues," the OECD is calling them "hazards," Robust Intelligence calls them "risks," ARVA calls them "vulnerabilities," AIAAIC calls them "controversies," and various algorithmic assessment organizations call them audits (or at least, audits will always produce one or more incident in waiting). All these things vary subtly in definition, application, and use between organizations.
The role of the Responsible AI Collaborative going back to the original research publication has always been to act as the union of multiple perspectives and provide tools to support sharing across those perspectives. This is a challenging proposition. Pretty much any multi-stakeholder ontological project I am aware of has inevitably degenerated into never ending discussions over the most difficult elements to define. For something that has no underlying, singular "right" answer, it is best to find ways of moving forward that don't require universal agreement. The purpose of this GitHub issue is to detail how to proceed technologically without needing to resolve the definitional question of "incident in waiting".
The AIID's current entrant into this space is the "AI Issue." We chose this term intentionally to cover multiple aspects of "incident in waiting." It is meant to be specific enough to capture elements of risk, while general enough to cover the field. The "issue" term also means we can index concepts covered by other communities and link out to those communities if/when they operate their own processes. While we would prefer such organizations join the Responsible AI Collaborative and integrate from the beginning, that will not be possible universally (e.g., when a database is operated by a sovereign state). Therefore, we need to maintain flexibility.
This also plugs into the drive for federating the AI Incident Database -- something that we will soon have a test case for with an index of deepfakes. Incident databases for things like deepfakes require different editing processes and metadata. How federation works with incidents is fairly clear. Incidents have a natural scope that will support federating responsibilities among multiple nodes. However, this does not work for incidents in waiting. Often there is no concrete definition of what specific system can produce the incident. Worse, all systems will produce a great many incidents when placed into the wrong context. Behind every system is an infinity of risks. This is why the ForHumanity audit criteria centers on these four elements,
Scope: The boundaries of a system, what is covered, what is not covered
Nature: The forces and processes that influence and control the variables and features
Purpose: The aim or goal of a system
Context: The circumstances in which an event occurs; including jurisdiction and/or location, behaviour and functional inputs to an AAA System that are appropriate
Without some variation of these elements, the risks produceable by a system cannot be bounded and/or expressed in any meaningful or useful way. For example, a LLM can be applied to an infinity of applications (safely or unsafely), while a webserver logging vulnerability is inherently scoped to the webserver. LLMs are scope/context free and yet present incidents in waiting in a massive array of circumstances. There is no closed world within which to index their risks so it defies enumeration.
More concretely,
Problem: The safety community currently lacks an enumerable definition of "system+context" and we are likely to never have one. The notion of system constantly changes in version, deployment circumstance, organizational processes, etc. The world context for these systems also similarly evolves through time. Absent a more universal grounding of system+context it is not possible to enumerate in a useful way. There will be too much noise.
Solution: Organizing Issues in terms of a numeric identifier or hierarchical structure is a road to editorial ruin. Don't attempt to universally enumerate context-free risk. Instead of organizing issues according to a definite scope, issues themselves can be tagged according to salient attributes, then those tags can then be queried according to values of interest that populate a listing.
Let me introduce by example.
Example Applied to a LLM
<< For illustrative purposes only >>
Press Release: "Dolittle LLM runs all LLMs produced to date with RLHF selecting among candidate outputs to produce an unbeatable hybrid LLM."
audit: "Dolittle can generate several classes of malware through prompt hacking, Dolittle may attempt to end people's marriages"
Audit Metadata {identifiers for hundreds of constituent LLMs, scope, nature, purpose, context, structured representation of findings, ...}
hazard, risk, and vulnerability Record Metadata: {identifiers for hundreds of constituent LLMs, additional reporting, various taxonomies...}
controversy 1: "This new superintelligent AI is coming for your marriage" Controversy Metadata {company, ...}
(subsequent incident) "Incident 27311: Dolittle LLM allegedly produced malware that subsequently destroyed the records of 17 hospital systems"
Metadata {Relevant Issue reports, Event Date, Alleged Developer, Alleged Deployer(s), Alleged Harmed Party(ies), Event Data, ...}
Now what can we do with this? Let's consider each of the report types as issue reports and present them all in a new page, but first we need to decide which reports are queried.
Populating an Issue Profile from a Query
Here I am introducing a new collection type of "Issue Profile," which is something that is programmatically generated from reports and never edited directly.
It is easy to present singular reports in isolation. That is what we are already doing here. What we are missing is some notion issue profiles whereby elements of audit, risk, vulnerability, etc. can be jointly presented. Issue profiles can be queried from the collection of metadata expressed across all reports.
User Story 1: "I want to know whether a particular model I am considering using has been implicated in any risks so I can decide whether I integrate it into my product"
Query: {select the model and its target operating context and see what returns}
User Story 2: "I want to know whether a particular scope has been identified as at-risk in an audit for any systems so I can know what to worry about"
Query: {select the model and its target operating context and see what returns}
User Story 3: "I want to know all the examples of LLM jailbreaks consistent with the Dolittle model so I can begin training safety systems"
Query: {select vulnerabilities for the Dolittle system and subset to input/output data}
User Story 4: "I want to monitor the space of emerging risks across all similarly disposed systems"
Query: {select a collection of similarly positioned systems}
After generating the query, what gets displayed?
New Page Type for Joining Issue Reports Returned by the Query
Right now the
/cite/###
pages have the following sections,We can define each of these as follows,
Much of this still requires discussion, but there are several elements on which we can proceed.
Required Functionality in Codebase
These are likely "Epics" in the agile world.
Today (ready for work)
2292
/cite/###
pages. Design wise, although the page is populated like an Amazon shopping cart, it can be presented more like the static/cite/###
pages where the tags are headings with report cards underneath. Whoever picks this up should talk with @lmcnulty since there are overlaps here with the risk checklisting work.2281
/cite/###
page so that taxonomies are added from the tools panel rather than a panel that displays to every permissioned user. We are going to have a lot more taxonomies.Soon (needs more definition)
Eventually (whenever other efforts become ready)
Many flowers are blooming. We look to make a bouquet.