responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
170 stars 35 forks source link

Support entity-entity relationships #2975

Open kepae opened 3 months ago

kepae commented 3 months ago

Now that we have a base interface for editing entities, we can more easily curate their basic data and potential relationships between them. Following up @pdcp1's mock here, we can now start to implement basic relationships, starting with a catch-all related relationship while we figure out a more durable system to commit to editing.

The primary function this would serve is to relate our soup of alleged developers, deployers, and victims implicated in AI incident reports.

Additionally, an entity relationship system can allow us to record and name AI systems and models and relate them to incidents, developers, and other AIID data.

Per https://github.com/responsible-ai-collaborative/aiid/issues/2536#issuecomment-1960443680:

An initial mockup for the Entity edit page is in our Figma workspace https://www.figma.com/file/KI28jWrOO3soKp9dTg7b0d/Entities-workflow?type=design&node-id=0%3A1&mode=design&t=a1M9nWWeHzoisF2d-1

We should take this as a kick-off design. I added some fields that I consider appropriate but we can discuss which ones are more important to implement in an initial phase. Feel free to edit it. I'm open to any suggestions and discussions.

image

Option 1

Simply extend the entity object with an array of related entity IDs.

{
  "entity_id": "amazon-warehouse-workers",
  "name": "Amazon warehouse workers"
  "related": [
      "amazon", // entity_id
       "warehouse-workers"
  ]
}

Pros:

Cons:

Option 2

Create a collection specifically for entity-entity relationships, i.e. an edge table. We could store these as basic semantic triples (subject, object, predicate), with additional metadata about whether or not the property/relationship is symmetric.

We would be entering an early implementation of semantic triples: https://en.wikipedia.org/wiki/Semantic_triple

{
  "pred": "related",
  "sub": "amazon-warehouse-workers",
  "obj": "amazon",
  "is_symmetric": true
}

Pros

Cons

Other options

Nota Bene

We are casually entering further into the territory of linked/semantic data. Many implementations and standards for such data already exist, e.g. RDFa, JSON-LD.

We should consider how we can make existing standards of linked data work for us in the future. Incident data and reports of AI harms are inherently graph-like in the real world as well as on AIID.

Related issues

pdcp1 commented 3 months ago

Great analysis 👌 I'm leaning towards option 2 since it offers the flexibility we need as the system grows.

As you pointed out, this approach enables us to easily incorporate new types of relationships and metadata. While it’s a bit more complex upfront, option 2 gives us a robust foundation to build on, keeping things clean and efficient as we grow. We need to consider the query performance since this approach requires joining data from different MongoDB collections. I'm not clear about the API implications.