medtorch / Q-Aid-Core

An intuitive platform for deploying the latest discoveries in healthcare AI to everybody's phones. Powered by PyTorch!
MIT License
11 stars 5 forks source link

Datasets + Modele cu bias + Metode de interpretare + POC #4

Closed tudorcebere closed 3 years ago

tudorcebere commented 4 years ago

Ideal, ar trebui luata informatia relevanta de aici pentru a vedea PoC-urile valoroase pe care am putea sa le facem si pe ce: https://github.com/pbiecek/xai_resources

andreimano commented 4 years ago

Mă pot ocupa eu de asta

bcebere commented 4 years ago

There are a lot more hot topics surrounding the bias in ML, we need to have a why to detect them first:

Searching twitter for "Machine learning bias" returns several interesting stories:

  1. 80 Million Tiny Images dataset controversy: Can we detect that the training "overfits" certains traits? https://www.theregister.com/2020/07/01/mit_dataset_removed/

  2. PULSE model controversy: can we spot this scenario? https://www.theverge.com/21298762/face-depixelizer-ai-machine-learning-tool-pulse-stylegan-obama-bias ?

  3. ImageNet bias issues https://hyperallergic.com/518822/600000-images-removed-from-ai-database-after-art-project-exposes-racist-bias/

  4. COMPAS algorithm issue : "ProPublica discovered that the COMPAS algorithm was able to predict the particular tendency of a convicted criminal to reoffend. However, with COMPAS, black offenders were evaluated as almost twice as likely as white offenders to be labeled a higher risk but not actually reoffend. On the other hand, white offenders were more often labeled as lower risk of reoffending than black offenders, despite their criminal history."

  5. Can we review Credit-Score Algorithms? https://www.gsb.stanford.edu/insights/big-data-racial-bias-can-ghost-be-removed-machine

  6. in a study late last year by the National Institute of Standards and Technology (NIST), researchers found evidence of racial bias in nearly 200 facial recognition algorithms. https://www.nist.gov/news-events/news/2019/12/nist-study-evaluates-effects-race-age-sex-face-recognition-software

  7. Can we spot issues in the most popular apps, that our method could have prevented? https://news.gallup.com/poll/228497/americans-already-using-artificial-intelligence-products.aspx

  8. Can we detect Sampling bias?

"A sampling bias happens when data is collected in a manner that oversamples from one community and under samples from another. This might be intentional or unintentional. The result is a model that is overrepresented by a particular characteristic, and as a result is weighted or biased in that way. The ideal sampling should either be completely random or match the characteristics of the population to be modeled."

  1. Can we detect Measurement bias?

? "Measurement bias is the result of not accurately measuring or recording the data that has been selected. For example, if you are using salary as a measurement, there might be differences in salary including bonus or other incentives, or regional differences in the data. Other measurement bias can result from using incorrect units, normalizing data in incorrect ways or miscalculations."

  1. Can we detect Exclusion bias ?

" exclusion bias arises from data that is inappropriately removed from the data source. When you have petabytes or more of data, it's tempting to select a small sample to use for training, but when doing so you might be inadvertently excluding certain data, resulting in a biased data set. Exclusion bias can also occur due to removing duplicates from data when the data elements are actually distinct."

  1. Can we detect "Experimenter or observer bias"?

" the act of recording data itself can be biased. When recording data, the experimenter or observer might only record certain instances of data, skipping others. Perhaps you're creating a machine learning model based on sensor data but only sampling every few seconds, missing key data elements. Or there is some other systemic issue in the way that the data has been observed or recorded. In some instances, the data itself might even become biased by the act of observing or recording that data, which could trigger behavioral changes."

  1. Can we detect "Prejudicial bias" ?

"data might become tainted by bias based on human activities that under-selected certain communities and over-selected others. When using historical data to train models, especially in areas that have previously been rife with prejudicial bias, care should be taken to make sure new models don't incorporate that bias."

  1. Can we detect "Confirmation bias" ?

    "Confirmation bias is the desire to select only the information that supports or confirms something you already know, rather than data that might suggest something that runs counter to preconceived notions. The result is data that is tainted because it was selected in a biased manner or because information that doesn't confirm the preconceived notion is thrown out."

  2. Can we detect Bandwagoning?

"The bandwagon effect is a form of bias that happens when there is a trend occurring in the data or in some community. As the trend grows, the data supporting that trend increases and data scientists run the risk of overrepresenting the idea in the data they collect. Moreover, any significance in the data may be short-lived: The bandwagon effect could disappear as quickly as it appeared."

There are confs on the topic too https://events.drupal.org/global2020/sessions/combatting-bias-machine-learning

andreimano commented 4 years ago

** Note: I deleted the previous comments and centralized everything (in english) in this one. This comment will be further updated.

(0. DISCUSSION) In 1 the authors are identifying 23 types of biases. An observation about the possibility that they could be intertwined is also made (fig. 2). A distinction about the type of bias detection is made in 1, where the authors categorize bias detection into "pre-processing", "in-processing" and "post-processing". I think that we should focus on the "in-processing" and "post-processing" methods - should a data bias detection technique be included in a Deep Learning library, or should it be decoupled from it? Maybe the detection of bias of a trained model makes more sense (you implicitly get hints about the dataset bias).

(1. DATA) We could use the following datasets for training models and identifying bias:

(2. MODELS) We can use and analyse the following (deep) models:

(3. INTERPRETABILITY) Methods for detecting bias in deep models:

Also see: #1 and #21

/ https://scholar.google.be/citations?user=EuFF9kUAAAAJ&hl=nl ; https://www.cs.toronto.edu/~toni/Papers/icml-final.pdf ; https://papers.nips.cc/paper/9603-on-the-fairness-of-disentangled-representations ; https://arxiv.org/pdf/1908.09635.pdf ; /

(4. PROOF OF CONCEPT) TO DO

(5. MOTIVATION) As for the motivation for the work, we can cite the following resources:

Papers:

Media: