responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
168 stars 35 forks source link

Discover incidents vs reports #936

Closed lmcnulty closed 1 year ago

lmcnulty commented 2 years ago

The results of searches in the Discover app correspond to reports, not incidents. Several people have reported that they found this confusing, including Janet, myself, and the subject in the think-aloud. It seems as if this design was inherited from the time when reports were the top-level entries in the database and incidents were just an id field shared among reports. This is no longer the case.

I think we should rework the Discover app to search incidents rather than reports. This is what people will naturally expect from something called an AI Incident database. We could just try to explain it better, but I think that would be swimming against the current. Thinking about what it would mean to search incidents rather than reports, lets go through the fields that currently exist in the Algolia index:

title We now have these for incidents, and they're generally higher-quality than the report titles.
authors I suspect think very few people actually care about this, but if we want to keep it, we could just join the authors of all the reports. We could call it something like "reporters" or "reported by" since incidents don't have authors.
description We have these for incidents now.
epoch_date_downloaded I doubt that anyone wants to search by this – it can probably be omitted.
epoch_date_modified ^
epoch_date_submitted ^
epoch_date_published This could be interesting if there were a significant number of incidents that were reported on long after they occurred. I haven't checked, but I suspect that this is rarely the case. It can probably be omitted too.
image_url We don't have these for incidents – for the TSNE visualization I just used the first report's image, which I think is fine.
language We could make this an array that contains all languages for which reports of the incident exist.
source_domain Some of the "authors" are just the sites that the reports comes from, so I think this could be combined with authors.
submitters I don't think many people want to search by this.
url ^
editor_notes ^
text I think we should just concatenate the report texts and let Algolia search those. It shouldn't show up in the UI – if people want to read about a report beyond the title and description, they'll click the result and go to the citation page.
incident_date This is the main date that matters, and I think we should highlight it more. You should be able to sort results by it.
classifications I believe these are already associated with incidents, not reports. They should also probably be emphasized more – the user in the think-aloud said that they were the most interesting part.

An additional bonus to using incidents instead of reports is that the index will be smaller and we'll be able to stay on Algolia's free tier longer.

smcgregor commented 2 years ago

On the top level I agree, but we differ in approach. Let me break the table you present apart a bit.

I doubt that anyone wants to search by this – it can probably be omitted.

Editors sometimes need to, otherwise, I agree there.

Regarding language

We could make this an array that contains all languages for which reports of the incident exist.

I think this should be extracted via graphql queries on the reports rather than maintaining two separate data structures that will need to be synchronized.

Regarding your source_domain comment,

Some of the "authors" are just the sites that the reports comes from, so I think this could be combined with authors.

I disagree, authors and institutions are different things. Even the same author can shift their writing voice substantially as they change institutions/editors. The cases where the authors and the institutions are the same are exceptions where the institution does not let the authors have bylines (e.g., the economist).

Regarding, submitters | I don't think many people want to search by this., the submitters do!

Regarding, text | I think we should just concatenate the report texts and let Algolia search those. It shouldn't show up in the UI – if people want to read about a report beyond the title and description, they'll click the result and go to the citation page. I think you are implying that the contents found here will be central to the card presentation in the discover app. I am good with this, but I don't think this is an "either we present reports or we present incidents" question. While it looks like you are wanting to eliminate fields, how about we add the incident info from apps/incidents to the Algolia records so they can be presented in the Algolia-derived cards? Then the rendering of the cards could flip between an incidents mode and a reports mode and we can still search across all the report text.

Part of the value of indexing the report text is that more of the vocabulary pertaining to an incident should be covered so Algolia will be able to provide much more complete indexing.

smcgregor commented 1 year ago

Steps

  1. Add the incident title and description to the current algolia indices
  2. Add the entities to the current algolia indices
  3. Change the Incidents rendering (i.e., when the dropdown selects "Incidents") so that the incident title and description show rather than the report title and description. I am not sure what the best way is to manage rendering different contents within the cards, but I would guess it makes sense to have different rendering logic within a component that receives the algolia data.
  4. Add filters for entities