Open npit opened 7 months ago
Thanks Nikiforos for documenting all of this (and capturing my last reflection accurately :-)). I think it would be interesting to explore a confidence modifier for the technical <-> non-technical axis, just as we have "potential" and "known". I don't think it needs to be implemented on each category in the same way, but there is something special about signaling that it is a non-technical failure. This might be inconvenient to express in our current taxonomy implementation, but I can look into it.
The GMF taxonomy characterizes incidents with respect to technical failure causes, i.e. any operation that takes place during implementation, training, validation, deployment, etc., of the AI system participating in the incident, if such an operation is linked to unexpected system performance, which in turn generates societal harm. Such failure causes may include methodological errors, design shortcomings, dataset inadequacies, fitting issues, etc.
However, there are a number of incidents in AIID where the reported harm is not a direct result of some kind of technical failure of the AI system deployed, but an outcome of miscellaneous failings. In order to have some description of failure for all annotated incidents, during these early stages of GMF development I have included labels with ambiguous technical relevance.
Such labels and annotated incidents are listed below and will be updated periodically:
Misuse
: known, potentialUnsafe Exposure or Access
: known, potentialMisinformation Generation Hazard
: known, potentialBlack Swan Event
: known, potentialHarmful Application
: known, potentialMalicious Marketing
: known, potentialTask Mismatch
: known, potentialDeployment Misconfiguration
: known, potentialSoftware Bug
: known, potentialIn the future, we will need to decide how to resolve such cases. Should we expand GMF with non-technical failure causes, or should we, e.g. include a catch-all
Non Technical Failure
label?Additional thoughts from @kepae include classifying non-technical failures as technical ones, for AI systems which advertise / guarantee some levels of safety. E.g., if a publicly available LLM advertises itself as safe and appropriate for minors, if it generates inappropriate outputs from normal prompts, that would be considered a technical failure (e.g.
Inappropriate Training Content
) . If the model does not guarantee such a thing, then a non-technical failure description would be something likeUnsafe Exposure or Access
.