w3c / dpv

Data Privacy Vocabularies and Controls CG (DPVCG)
https://w3id.org/dpv
Other
45 stars 27 forks source link

Adding AI bias concepts #182

Open DelaramGlp opened 3 months ago

DelaramGlp commented 3 months ago

Adding concepts from the extension of AIRO for AI bias developed by @drd00

coolharsh55 commented 3 months ago

Hi @DelaramGlp @drd00 thanks for the proposal. Can you please put (only) the proposed concepts in a separate file or make a list to help with the discussions? I have copied my response to Delaram from yesterday below:

drd00 commented 3 months ago

Hi @coolharsh55, sorry that I couldn't get back to you on this sooner. Below is a list of some proposed concepts based on your response.

Bias taxonomy

From ISO/IEC 24027, there are the following concepts, organised by their position in the derived taxonomy (each with the Bias suffix, to be added to the Risk taxonomy): Top-level:

Subclass of CognitiveBias:

Subclass of DataBias:

Subclass of SelectionBias:

This taxonomy is further expanded with terms from AIO as well as incorporating societal biases adapted from GDPR Article 9 and including gender bias. AIO bias categories are themselves based upon a NIST special report (https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1270.pdf) which could be linked to directly as you mention.

Other bias categories (as above):

There are then in addition those concepts derived from AIO. These were redefined in my implementation in part due to overlap between terms included from ISO 24027 and those from AIO, which could lead to confusing semantics when extending the ISO-derived classes.

Fairness metrics & mitigation measures

Concerning bias mitigation measures, these are also derived from AI Fairness 360 (AIF360), specifically here: https://aif360.readthedocs.io/en/stable/modules/algorithms.html. This is the same resource which was used to extract the taxonomy of fairness metrics.

Regarding fairness metrics, the ISO 24027 standard does introduce fairness metrics for classification systems in Section 8, specifically:

However, this list is quite limited, and each of these are essentially definitions of fairness, i.e., ‘the degree to which equalised odds holds’ is required when representing test results. However, given that they are nonetheless based on an international standard, they might provide a good starting point in this context.

coolharsh55 commented 3 months ago

Hi. That's a comprehensive proposal, and interesting work - well done and thank you. I think the bias concepts can go as is since they are based in an ISO standard. The GDPR special category ones can also be added except for the following which are unclear as to their meaning.

Regarding fairness metrics - are these measures to reduce bias or measure bias? If they reduce it, then they are risk mitigation measures, and if they measure it then they are metrics which must be used with some risk mitigation measure.

TallTed commented 3 months ago
  • Bias related to genetic data - what kind of bias is this which is attached to genetic data, as Racial Bias is a separate concept? Suggest renaming this to GeneticBias - bias based on genetic conditions and features (think genism or genetic discrimination)

Racial Bias might be considered a special case of Genetic Bias. Other possible Genetic Biases could include X and Y variations (there are several); iris colors ("Genes with reported roles in eye color include ASIP, IRF4, SLC24A4, SLC24A5, SLC45A2, TPCN2, TYR, and TYRP1. The effects of these genes likely combine with those of OCA2 and HERC2 to produce a continuum of eye colors in different people."), hair color ("hundreds of genes influence hair color")...

Several Genetic Biases could also be considered (closely related to) Biometric Biases. I foresee a lot of overlap of both of these with Health Biases.

drd00 commented 3 months ago

@coolharsh55 Thanks. Regarding fairness metrics, these serve to measure, rather than mitigate, bias.

I agree with @TallTed that there is significant overlap in some of these categories. I certainly agree that referring to racial bias as a special case of genetic bias makes sense. Regarding the distinction between biometric biases and genetic biases, biometric biases might be considered more directly concerned with the perspective of the system. For example, a biometric bias in a facial recognition system could be skin tone bias (where skin tone is perhaps a feature), with a corresponding genetic bias being racial bias. Although skin tone has a significant genetic component, it may make sense in some cases to make such distinctions.

coolharsh55 commented 3 months ago

Hi @TallTed @drd00 thank your for your thoughts. Race is not purely based on genetics but is also a sociological concept. So we should not have RacialBias and GeneticBias in a parent/child relationship. And as biometric is a parent concept of genetic in the context of GDPR Art.4-13 and Art.4-14, it makes sense as Ted says to model BiometricBias as the parent concept of GeneticBias. However, Biometric and Health have an overlap but also have differences (GDPR Art.4-15) so we shouldn't model them into a hierarchy.

TallTed commented 3 months ago

Race is not purely based on genetics but is also a sociological concept. So we should not have RacialBias and GeneticBias in a parent/child relationship.

True. My bad for responding quickly and without fully thinking. I will note that my writing started from this, from @coolharsh55, which started the blur between genetic and racial

Bias related to genetic data - what kind of bias is this which is attached to genetic data, as Racial Bias is a separate concept?

As with a great many ontologies, the concepts being modeled are far more complex than they may appear even at 17th glance, never mind first glance.

drd00 commented 3 months ago

Corrections

The previously discussed taxonomy is based on pre-release ISO/IEC 24027:2020 rather than the ISO/IEC 24027:2021 international standard document. A great majority of details overlap in the data bias and cognitive bias categories previously described, but there is in addition an engineering decision bias subtree (subclass of Bias).

The taxonomy may be extracted from the table of contents of ISO/IEC 24027:2021: iso-iec_24027_toc As I do not have access to the ISO / IEC 24027:2021 full text, term descriptions in the following section are from ISO/IEC 24027:2020.

RDF triples for bias

Below is a tentative representation of ISO/IEC 24027 concepts specifically in the Risk taxonomy.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dpv: <https://w3id.org/dpv#> .
@prefix risk: <https://w3id.org/dpv/risk#> .

risk:Bias rdf:type rdfs:Class ;
    dct:source "ISO/IEC 24027:2020, 3.3.2"@en ;
    dct:description "systematic difference in treatment of certain objects, people, or groups in comparison to others."@en ;
    rdfs:subClassOf risk:RiskSource .

risk:AIBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:Bias .

# Subclass of risk mitigation measure and assessment?
risk:BiasAudit rdf:type rdfs:Class ;
    rdfs:subClassOf dpv:RiskMitigationMeasure ,
    dpv:Assessment .

##########################################
# Cognitive bias subtree
##########################################
risk:CognitiveBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:AIBias ;
    dct:source "ISO/IEC 24027:2020, 3.3.4"@en ;
    dct:comment "human bias that might impact the design and application of a system."@en .

risk:AutomationBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2020, 3.3.1"@en ;
    dct:comment "type of human cognitive bias due to over-reliance on the recommendations of an AI system"@en .

risk:GroupAttributionBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2020, 7.2.3"@en ;
    dct:comment "Group attribution bias occurs when a human assumes that what is true for an individual or object is also true for everyone, or all objects, in that group."@en .

risk:ImplicitBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2020, 7.2.4"@en ;
    dct:comment "Implicit bias occurs when a human makes an association or assumption based on their mental models and memories."@en .

risk:ConfirmationBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2020, 3.3.4"@en ;
    dct:comment "type of human cognitive bias that favours predictions of AI systems that confirm pre-existing beliefs or hypotheses."@en .

risk:InGroupBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2020, 7.2.6"@en ;
    dct:comment "In-group bias occurs when showing partiality to one's own group or own characteristics."@en .

risk:OutGroupHomogeneityBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2020, 7.2.7"@en ;
    dct:comment "Out-group homogeneity bias occurs when seeing out-group members as more alike than in-group members when comparing attitudes, values, personality traits, and other characteristics."@en .

risk:SocietalBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2020, 7.2.9"@en ;
    dct:comment "Societal bias occurs when one or more similar cognitive biases (conscious or unconscious) are being held by many individuals in society. This societal bias originates from society at large and could be closely related to other cognitive or statistical biases. It manifests as data available about society that reflects historical patterns. Societal bias can also be considered a type of data bias."@en .

risk:RuleBasedSystemDesign rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias .

risk:RequirementsBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:CognitiveBias .

##########################################
# Data bias subtree
##########################################
risk:DataBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:AIBias ;
    dct:source "ISO/IEC 24027:2020, 3.3.7"@en ;
    dct:comment "data properties that if unaddressed lead to AI systems that perform better or worse for different groups."@en .

risk:StatisticalBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO 20501:2019, 3.3.9"@en ;
    dct:comment "type of consistent numerical offset in an estimate relative to the true underlying value, inherent to most estimates."@en .

risk:SelectionBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:StatisticalBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.2.1"@en ;
    dct:comment "Selection bias occurs when data is not collected randomly from the intended population."@en .

risk:SamplingBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:SelectionBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.2.2"@en ;
    dct:comment "Sampling bias occurs when data is not collected randomly from the intended population."@en .

risk:CoverageBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:SelectionBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.2.3"@en ;
    dct:comment "Coverage bias occurs when a population represented in a dataset does not match the population that the ML model is making predictions about."@en .

risk:NonResponseBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:SelectionBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.2.4"@en ;
    dct:comment "Non-response bias (also called participation bias) occurs when people from certain groups opt-out of surveys at different rates than users from other groups."@en .

risk:ConfoundingVariablesBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:StatisticalBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.8"@en ;
    dct:comment "A confounding variable that influences both the dependent variable and independent variable causing a spurious association."@en .

risk:NonNormalityBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:StatisticalBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.9"@en ;
    dct:comment "if the data is subject to a different distribution [from normal] (e.g. Chi-Square, Beta, Lorentz, Cauchy, Weibull or Pareto) the result might be biased and misleading."@en .

risk:DataLabelsAndLabellingProcessBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.3"@en ;
    dct:comment "The labelling process itself potentially introduces the cognitive or societal biases described in subclause 7.2 to the data."@en .

risk:NonRepresentativeSamplingBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.4"@en ;
    dct:comment "If a dataset is not representative of the intended deployment environment, then the model has the potential to learn biases based on the ways in which the data is non-representative."@en .

risk:MissingFeaturesAndLabelsBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.5"@en ;
    dct:comment "Features are often missing from individual training samples. If the frequency of missing features is higher for one group than another then this presents another vector for bias."@en .

risk:DataProcessingBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.6"@en ;
    dct:comment "Bias might also creep in due to pre-processing (or post-processing) of data, even though the original data would not have led to any bias."@en .

risk:SimpsonsParadoxBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.7"@en ;
    dct:comment "Simpson's paradox manifests when a trend that is indicated in individual groups of data reverses when the groups of data are combined."@en .

risk:DataAggregationBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.11, http://arxiv.org/abs/1901.10002"@en ;
    dct:comment "Aggregating data covering different groups of objects that might have different statistical distributions can introduce bias into the data used to train AI systems."@en .

risk:DistributedTrainingBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2020, 7.3.12"@en ;
    dct:comment "Distributed ML might introduce its own cause for data bias as the different sources of data might not have the same distribution of feature space."@en .

risk:OtherDataBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:DataBias .

##########################################
# Engineering decision bias subtree
##########################################
risk:EngineeringDecisionBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:AIBias .

risk:FeatureEngineeringBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:EngineeringDecisionBias .

risk:AlgorithmSelectionBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:EngineeringDecisionBias .

risk:HyperparameterTuningBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:EngineeringDecisionBias .

risk:InformativenessBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:EngineeringDecisionBias .

risk:ModelBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:EngineeringDecisionBias .

risk:ModelInteractionBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:EngineeringDecisionBias .

risk:ModelExpressivenessBias rdf:type rdfs:Class ;
    rdfs:subClassOf risk:ModelInteractionBias .
coolharsh55 commented 3 months ago

@drd00 thanks for this. While I review the RDF triples [note], you can get access to the ISO standard via the TCD library - ask Delaram to show you how. Which version of the standard did you use? Note that there is no 2020 version. E.g. was it at the CD stage which means concepts could have had major revisions? Or was it the DIS stage which means it is unlikely there were any major changes.

[note] one important change is that we use SKOS relations by default in the taxonomy instead of modelling everything as strict OWL classes, and then convert the SKOS into an alternate OWL serialisation

drd00 commented 2 months ago

UPDATE: New triples below, using SKOS relations (I used classes from risk/risk.ttl as a reference). Each of the definitions are extracted from sections of ISO 24027:2021 which describe the concepts.

@coolharsh55 Should I remove internal references from definitions?

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dpv: <https://w3id.org/dpv#> .
@prefix risk: <https://w3id.org/dpv/risk#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

risk:Bias rdf:type rdfs:Class ,
        skos:Concept ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date  ;
    skos:definition "systematic difference in treatment of certain objects, people, or groups in comparison to others."@en ;
    skos:prefLabel "bias"@en ;
    rdfs:subClassOf risk:RiskSource .

risk:AIBias rdf:type rdfs:Class ,
        skos:Concept ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:prefLabel "AI bias"@en ;
    rdfs:subClassOf risk:Bias .

# Subclass of risk mitigation measure and assessment?
risk:BiasAudit rdf:type rdfs:Class ,
        skos:Concept ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:prefLabel "bias audit"@en ;
    rdfs:subClassOf dpv:RiskMitigationMeasure ,
    dpv:Assessment .

##########################################
# Cognitive bias subtree
##########################################
risk:CognitiveBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:AIBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "bias that occurs when humans are processing and interpreting information."@en ;
    skos:prefLabel "cognitive bias"@en .

risk:AutomationBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "propensity for humans to favour suggestions from automated decision-making systems and to ignore contradictory information made without automation, even if it is correct."@en ;
    skos:prefLabel "automation bias"@en .

risk:GroupAttributionBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Group attribution bias occurs when a human assumes that what is true for an individual or object is also true for everyone, or all objects, in that group."@en ;
    skos:prefLabel "group attribution bias"@en .

risk:ImplicitBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Implicit bias occurs when a human makes an association or assumption based on their mental models and memories."@en ;
    skos:prefLabel "implicit bias"@en .

risk:ConfirmationBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Confirmation bias occurs when hypotheses, regardless of their veracity, are more likely to be confirmed by the intentional or unintentional interpretation of information."@en ;
    skos:prefLabel "confirmation bias"@en .

risk:InGroupBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "In-group bias occurs when showing partiality to one's own group or own characteristics."@en ;
    skos:prefLabel "in-group bias"@en .

risk:OutGroupHomogeneityBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Out-group homogeneity bias occurs when seeing out-group members as more alike than in-group members when comparing attitudes, values, personality traits, and other characteristics."@en ;
    skos:prefLabel "out-group homogeneity bias"@en .

risk:SocietalBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Societal bias occurs when similiar cognitive bias (conscious or unconscious) is being held by many individuals in society."@en ;
    skos:prefLabel "societal bias"@en .

risk:RuleBasedSystemDesign rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Developer experience and expert advice can have a significant influence on rule-based system design while also potentially introducing various forms of human cognitive bias."@en ;
    skos:prefLabel "rule-based system design"@en .

risk:RequirementsBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:CognitiveBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    # Perhaps do not include internal references in the dct:comment?
    skos:definition "Requirements creation presents occasions for the human cognitive biases listed in 6.2 to manifest."@en ;
    skos:prefLabel "requirements bias"@en .

##########################################
# Data bias subtree
##########################################
risk:DataBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:AIBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "data properties that if unaddressed lead to AI systems that perform better or worse for different groups."@en ;
    skos:prefLabel "data bias"@en .

risk:StatisticalBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO 20501:2019" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "type of consistent numerical offset in an estimate relative to the true underlying value, inherent to most estimates."@en ;
    skos:prefLabel "statistical bias"@en .

risk:SelectionBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:StatisticalBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Selection bias occurs when a dataset's samples are chosen in a way that is not reflective of their real-world distribution."@en ;
    skos:prefLabel "selection bias"@en .

risk:SamplingBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:SelectionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Sampling bias occurs when data records are not collected randomly from the intended population."@en ;
    skos:prefLabel "sampling bias"@en .

risk:CoverageBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:SelectionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Coverage bias occurs when a population represented in a dataset does not match the population that the ML model is making predictions about."@en ;
    skos:prefLabel "coverage bias"@en .

risk:NonResponseBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:SelectionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Non-response bias (also called participation bias) occurs when people from certain groups opt-out of surveys at different rates than users from other groups."@en ;
    skos:prefLabel "non-response bias"@en .

risk:ConfoundingVariablesBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:StatisticalBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "A confounding variable that influences both the dependent variable and independent variable causing a spurious association."@en ;
    skos:prefLabel "confounding variables bias"@en .

risk:NonNormalityBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:StatisticalBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Most statistical methods assume that the dataset is subject to a normal distribution. However, if the dataset is subject to a different distribution (e.g., Chi-Square, Beta, Lorentz, Cauchy, Weibull or Pareto) the results can be biased and misleading."@en ;
    skos:prefLabel "non-normality bias"@en .

risk:DataLabelsAndLabellingProcessBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "The labelling process itself potentially introduces the cognitive or societal biases described in 6.2 to the data."@en ;
    skos:prefLabel "data labels and labelling process bias"@en .

risk:NonRepresentativeSamplingBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "If a dataset is not representative of the intended deployment environment, then the model has the potential to learn biases based on the ways in which the data is non-representative."@en ;
    skos:prefLabel "non-representative sampling bias"@en .

risk:MissingFeaturesAndLabelsBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Features are often missing from individual training samples. If the frequency of missing features is higher for one group than another then this presents another vector for bias."@en ;
    skos:prefLabel "missing features and labels bias"@en .

risk:DataProcessingBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Bias can also creep in due to pre-processing (or post-processing) of data, even though the original data would not have led to any bias."@en ;
    skos:prefLabel "data processing bias"@en .

risk:SimpsonsParadoxBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Simpson's paradox manifests when a trend that is indicated in individual groups of data reverses when the groups of data are combined."@en ;
    skos:prefLabel "Simpson's paradox bias"@en .

risk:DataAggregationBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Aggregating data covering different groups of objects that might have different statistical distributions can introduce bias into the data used to train AI systems."@en ;
    skos:prefLabel "data aggregation bias"@en .

risk:DistributedTrainingBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Distributed ML can introduce its own cause for data bias as the different sources of data might not have the same distribution of feature space."@en ;
    skos:prefLabel "distributed training bias"@en .

risk:OtherDataBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:DataBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "The data and any labels can also be biased by artefacts or other disturbing influences."@en ;
    skos:prefLabel "other data bias"@en .

##########################################
# Engineering decision bias subtree
##########################################
risk:EngineeringDecisionBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:AIBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "ML model architectures - encompassing all model specifications, parameters and manually designed features - can be biased in several ways. Data bias and human cognitive bias can contribute to such bias."@en ;
    skos:prefLabel "engineering decision bias"@en .

risk:FeatureEngineeringBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:EngineeringDecisionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Steps such as encoding, data type conversion, dimensionality reduction and feature selection are subject to choices made by the AI developer and can introduce bias in the ML model."@en ;
    skos:prefLabel "feature engineering bias"@en .

risk:AlgorithmSelectionBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:EngineeringDecisionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "The selection of ML algorithms built into the AI system can introduce unwanted bias in predictions made by the system. This is because the type of algorithm used introduces a variation in the performance of the ML model."@en ;
    skos:prefLabel "algorithm selection bias"@en .

risk:HyperparameterTuningBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:EngineeringDecisionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Hyperparameters define how the model is structured and cannot be directly trained from the data like model parameters. Thus, hyperparameters affect the model functioning and accuracy of the model and thus can potentially lead to bias."@en ;
    skos:prefLabel "hyperparameter tuning bias"@en .

risk:InformativenessBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:EngineeringDecisionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "For some groups, the mapping between inputs present in the data and outputs are more difficult to learn. This can happen when some features are highly informative about one group, while a different set of features is highly informative about another group. If this is the case, then a model that only has one feature set available, can be biased against the group whose relationships are difficult to learn from available data."@en ;
    skos:prefLabel "informativeness bias"@en .

risk:ModelBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:EngineeringDecisionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "Given that ML often uses functions like a maximum likelihood estimator to determine parameters, if there is data skew or under-representation present in the data, the maximum likelihood estimation tends to amplify any underlying bias in the distribution."@en ;
    skos:prefLabel "model bias"@en .

risk:ModelInteractionBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:EngineeringDecisionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "It is possible for the structure of a model to create biased predictions."@en ;
    skos:prefLabel "model interaction bias"@en .

risk:ModelExpressivenessBias rdf:type rdfs:Class ,
        skos:Concept ;
    rdfs:subClassOf risk:ModelInteractionBias ;
    dct:source "ISO/IEC 24027:2021" ;
    dct:contributor "Daniel Doherty" ;
    dct:created "2024-09-13"^^xsd:date ;
    skos:definition "The number and nature of parameters in a model as well as the neural network topology can affect the expressiveness of the model. Any feature that affects model expressiveness differently across groups has the potential to cause bias."@en ;
    skos:prefLabel "model expressiveness bias"@en .
coolharsh55 commented 2 months ago

Hi @drd00 - thanks for the comprehensive work. These look good. I'll put them in a spreadsheet and get them in to the DPV specs. A noticeable changes that will have to be made (I can do that):

For all intents and purposes, I think these concepts should be considered as having been accepted. And the rest of the work is polishing them and adding them to the documentation, which I will hopefully do later this week. I will review the final output with you before it goes live.

coolharsh55 commented 2 months ago

@drd00 @DelaramGlp I have added the RDF concepts to RISK and AI extensions. In this, I changed the following (please review):

  1. Split the concepts as "general bias" (which go in RISK) and "ai specific bias" concepts (which go in AI extension)
  2. Changed the definition to make it consistent with declaration of a concept
  3. Removed OtherDataBias as it is a non-concept - it represent biases not categorised in other concepts - for which the generic concept Bias or DataBias is used directly.

The live version is:

The spreadsheets are:

Please also advise on whom should be listed as contributors for these concepts (other than you). Delaram? Me? Other person?

drd00 commented 2 months ago

@coolharsh55 Apologies for the delayed response. That looks perfect to me. Regarding contributors, I think it's fine to list me, you and Delaram, playing different parts in the process of integrating these terms into DPV.

coolharsh55 commented 2 months ago

Thanks @drd00. Do you or @DelaramGlp know how we should relate this work to Doc-bias? https://github.com/tibonto/Doc-BIAS/blob/main/docbias-schema.ttl I think it has more concepts related to bias types etc. but I didn't see links back to ISO. Do we put a link to this work in the documentation? Or a more thorough alignment is needed where we map our concepts with theirs?

drd00 commented 2 months ago

@coolharsh55 Doc-BiasO uses terms from the AI Ontology, which is itself based on a NIST categorisation of bias in AI (https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1270.pdf), and this has a number of concepts not included in the ISO categorisation (it is not related to ISO). However, there is significant overlap, so aligning terms is possible to an extent, if desired.

coolharsh55 commented 2 months ago

Thanks Daniel. I will add a note to this effect in the HTML documentation.