Open kepae opened 3 months ago
Great analysis 👌 I'm leaning towards option 2 since it offers the flexibility we need as the system grows.
As you pointed out, this approach enables us to easily incorporate new types of relationships and metadata. While it’s a bit more complex upfront, option 2 gives us a robust foundation to build on, keeping things clean and efficient as we grow. We need to consider the query performance since this approach requires joining data from different MongoDB collections. I'm not clear about the API implications.
Now that we have a base interface for editing entities, we can more easily curate their basic data and potential relationships between them. Following up @pdcp1's mock here, we can now start to implement basic relationships, starting with a catch-all
related
relationship while we figure out a more durable system to commit to editing.The primary function this would serve is to relate our soup of alleged developers, deployers, and victims implicated in AI incident reports.
Additionally, an entity relationship system can allow us to record and name AI systems and models and relate them to incidents, developers, and other AIID data.
Per https://github.com/responsible-ai-collaborative/aiid/issues/2536#issuecomment-1960443680:
Option 1
Simply extend the entity object with an array of
related
entity IDs.Pros:
Cons:
related
is a symmetric relationship; we would want both entities to have the property. Thus, we would have to synchronously update two entities that are currently stored separately in theentities
collection – and ideally validate this continuously.developed_by
), will we just add another field to the mongo document? Or begin the migration to....Option 2
Create a collection specifically for entity-entity relationships, i.e. an edge table. We could store these as basic semantic triples (subject, object, predicate), with additional metadata about whether or not the property/relationship is symmetric.
We would be entering an early implementation of semantic triples: https://en.wikipedia.org/wiki/Semantic_triple
Pros
developed_by
.developed_by
, which would be complemented with adevelops
relationship).Cons
Other options
Nota Bene
We are casually entering further into the territory of linked/semantic data. Many implementations and standards for such data already exist, e.g. RDFa, JSON-LD.
We should consider how we can make existing standards of linked data work for us in the future. Incident data and reports of AI harms are inherently graph-like in the real world as well as on AIID.
Related issues