responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
171 stars 35 forks source link

GMF labels describing non-technical failure causes #2704

Open npit opened 7 months ago

npit commented 7 months ago

The GMF taxonomy characterizes incidents with respect to technical failure causes, i.e. any operation that takes place during implementation, training, validation, deployment, etc., of the AI system participating in the incident, if such an operation is linked to unexpected system performance, which in turn generates societal harm. Such failure causes may include methodological errors, design shortcomings, dataset inadequacies, fitting issues, etc.

However, there are a number of incidents in AIID where the reported harm is not a direct result of some kind of technical failure of the AI system deployed, but an outcome of miscellaneous failings. In order to have some description of failure for all annotated incidents, during these early stages of GMF development I have included labels with ambiguous technical relevance.

Such labels and annotated incidents are listed below and will be updated periodically:

In the future, we will need to decide how to resolve such cases. Should we expand GMF with non-technical failure causes, or should we, e.g. include a catch-all Non Technical Failure label?

Additional thoughts from @kepae include classifying non-technical failures as technical ones, for AI systems which advertise / guarantee some levels of safety. E.g., if a publicly available LLM advertises itself as safe and appropriate for minors, if it generates inappropriate outputs from normal prompts, that would be considered a technical failure (e.g. Inappropriate Training Content) . If the model does not guarantee such a thing, then a non-technical failure description would be something like Unsafe Exposure or Access.

kepae commented 7 months ago

Thanks Nikiforos for documenting all of this (and capturing my last reflection accurately :-)). I think it would be interesting to explore a confidence modifier for the technical <-> non-technical axis, just as we have "potential" and "known". I don't think it needs to be implemented on each category in the same way, but there is something special about signaling that it is a non-technical failure. This might be inconvenient to express in our current taxonomy implementation, but I can look into it.