Discrimination + Machine Learning

cgreene commented 7 years ago

In the subsection of 03_categorize.md for #297 this came up. I think we need a bit more discussion around the topic to sufficiently resolve this. In the interests of time, I'm creating this issue for discussion before we go back to improve the section.

cgreene commented 7 years ago

@traversc : I created this for discussion. I think there's quite a bit of literature on differences in prescription practices by doctors based on racial and ethnic groups. I don't think we need an example of this being embedded into an ML model for this section. I think showing that it would exist in training data is sufficient.

cgreene commented 7 years ago

@agitter : I don't that the potential for discrimination needs to be deep learning specific ( referring to https://github.com/greenelab/deep-review/pull/297#discussion_r110517161 ). I am hoping we can provide selected provide examples to readers to consider. I think @davharris also raised this discussion on twitter not too long ago.

davharris commented 7 years ago

I don't know a whole lot about the medical/genetics side of things, but here's a list of things that come to mind. This is from memory, so I might get some details of the anecdotes wrong, but I can look the details up if they'd be helpful.

I have heard of cases where genetic tests for disease work differently in different populations (either because of differences in the genetic background that cause a given allele to have different effects or because the test relies on population-specific linkage disequilibrium between the genetic marker and the actual allele of interest).
I've also read about cases where women (especially pregnant or lactating women) are excluded from clinical studies and so we don't actually know what healthy baselines would look like for them (let alone how they would respond to various drugs that they get prescribed).
More generally, stats/ML inferences run into trouble when the training data doesn't match the intended use. These mismatches can be incredibly subtle:
- In Kaggle competitions on medical data, there have often been nominally-irrelevant variables that correlate tightly with the outcome of interest (e.g. one piece of equipment was primarily used to scan healthy patients and another was used primarily to scan sick ones)
- Along similar lines, the twitter rant that Casey mentioned talked about the importance of distinguishing between crimes committed & crimes detected by potentially-biased cops.
- This isn't medical, but it's a great: Tom White ("dribnet" on Twitter) has a lot of examples where facial expression data is correlated in odd ways with things like head orientation & gender. He's had to put in a lot of effort making it so that adding a smile to a photograph doesn't also make the face more feminine and move the face around. More recently, he pointed out a case where a model basically concluded that zebras are striped horses that live on brown grass instead of green grass & talked about the implications for cases w/ human data.

I have lots more to say about this sort of thing, but I think I'll stop here for now. Hope some of it's useful. Let me know if you have any questions about my examples or if you're looking for something else.

traversc commented 7 years ago

@cgreen @davharris:

I found an article discussing difference in opioid prescription based on ethnicity. If you think it's a good example, I can write a short summary for the introduction section.

http://ajph.aphapublications.org/doi/abs/10.2105/AJPH.93.12.2067

PS: apologies for the deleted comment... we frequently upload things to our lab group github, so I'm often logged into the wrong account.

agitter commented 7 years ago

@davharris covered some of the examples that I had in mind when I left the comment in #297 (and more). Kate Crawford has written about these and related examples, and her NYT article could serve as a reference.

agitter commented 7 years ago

A new relevant paper is Semantics derived automatically from language corpora contain human-like biases with an accompanying discussion

cgreene commented 7 years ago

@davharris : This seems to be a topic that you're knowledgeable on. Thank you for contributing your knowledge thus far. Do you want to write a paragraph touching on this? If so, we'd love to have you contribute as a coauthor. If not, I can write based on the materials that you compiled.

davharris commented 7 years ago

Here's what I have. I'd open a PR for it myself, but it looks like the repository has a lot of structure (especially involving references) that I don't want to break.

Research samples are frequently non-representative of the general population of interest; they tend to be sicker [@doi:10.1086/512821], more male [@doi:10.1016/j.neubiorev.2010.07.002], and more European in ancestry [@doi:10.1371/journal.pbio.1001661]. One well-known consequence of these biases in genomics is that penetrance is consistently lower in the general population than would be implied by case-control data, as reviewed in @doi:10.1086/512821. Moreover, genetic associations that hold in one population may not hold in other populations with different patterns of linkage disequilibrium [even when population stratification is explicitly controlled for; @doi:10.1038/nrg2813]. As a result, many genomic findings are of limited value for people of non-European ancestry[@doi:10.1371/journal.pbio.1001661]. Methods have been developed for mitigating some of these problems in genomic studies [@doi:10.1086/512821; @doi:10.1038/nrg2813], but it is not clear how easily they can be adapted for deep models that are designed specifically to extract subtle effects from high-dimensional data. For example, differences in the equipment that tended to be used for cases versus controls have led to spurious genetic findings [e.g. @10.1126/science.333.6041.404-a]; in some contexts, it may not be possible to correct for all of these differences to the degree that a deep network is unable to use them. The availability of such nominally-irrelevant but highly-predictive data features, or of features whose value would ordinarily be known after the machine learning task is complete, is called "leakage" [@doi:10.1145/2382577.2382579]. When leakage is severe, our models may say more about the way the data was collected than they say about anything of scientific or predictive value, with potentially disastrous policy consequences [@doi:10.1111/j.1740-9713.2016.00960.x]. @doi:10.1145/2382577.2382579 discuss some ways in which leakage and its effects can be controlled, but the problem is far from being solved.

@article{doi:10.1086/512821, title={Overcoming the winner’s curse: estimating penetrance parameters from case-control data}, author={Z{\"o}llner, Sebastian and Pritchard, Jonathan K}, journal={The American Journal of Human Genetics}, volume={80}, number={4}, pages={605--615}, year={2007}, publisher={Elsevier} }

@article{doi:10.1038/nrg2813, title={New approaches to population stratification in genome-wide association studies}, author={Price, Alkes L and Zaitlen, Noah A and Reich, David and Patterson, Nick}, journal={Nature Reviews Genetics}, volume={11}, number={7}, pages={459--463}, year={2010}, publisher={Nature Publishing Group} }

@misc{10.1126/science.333.6041.404-a, title={Retraction}, author={Sebastiani, Paola and Solovieff, Nadia and Puca, Annibale and Hartley, Stephen W and Melista, Efthymia and Andersen, Stacy and Dworkis, Daniel A and Wilk, Jemma B and Myers, Richard H and Steinberg, Martin H and others}, year={2011}, publisher={American Association for the Advancement of Science} }

@article{doi:10.1145/2382577.2382579, title={Leakage in data mining: Formulation, detection, and avoidance}, author={Kaufman, Shachar and Rosset, Saharon and Perlich, Claudia and Stitelman, Ori}, journal={ACM Transactions on Knowledge Discovery from Data (TKDD)}, volume={6}, number={4}, pages={15}, year={2012}, publisher={ACM} }

@article{@doi:10.1016/j.neubiorev.2010.07.002, title={Sex bias in neuroscience and biomedical research}, author={Beery, Annaliese K and Zucker, Irving}, journal={Neuroscience \& Biobehavioral Reviews}, volume={35}, number={3}, pages={565--572}, year={2011}, publisher={Elsevier} }

@article{doi:10.1371/journal.pbio.1001661, title={Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study}, author={Carlson, Christopher S and Matise, Tara C and North, Kari E and Haiman, Christopher A and Fesinmeyer, Megan D and Buyske, Steven and Schumacher, Fredrick R and Peters, Ulrike and Franceschini, Nora and Ritchie, Marylyn D and others}, journal={PLoS Biol}, volume={11}, number={9}, pages={e1001661}, year={2013}, publisher={Public Library of Science} }

@article{doi:10.1111/j.1740-9713.2016.00960.x, title={To predict and serve?}, author={Lum, Kristian and Isaac, William}, journal={Significance}, volume={13}, number={5}, pages={14--19}, year={2016}, publisher={Wiley Online Library} }

davharris commented 7 years ago

uh, with apologies to the users named @article, @doi, and @misc for pinging them.

aaronsheldon commented 7 years ago

...and an example of automated discrimination in practice Automated Inference on Criminality using Face Images. It may be worth a quick sentence on the problems with the cited research?

cgreene commented 7 years ago

@aaronsheldon : can you file another PR to add a sentence?

davharris commented 7 years ago

I thought about this paper, but I was disinclined to reward those folks with a citation.

akundaje commented 7 years ago

I would agree with not citing this paper.

On May 9, 2017 11:51 AM, "David J. Harris" notifications@github.com wrote:

I thought about this paper, but I was disinclined to reward those folks with a citation.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/302#issuecomment-300265845, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EUk2OlZKRyjiDZJHd5zwfReWtVyBks5r4LWWgaJpZM4M5Aqz .

cgreene commented 7 years ago

I think that the mention should clearly indicate the problems with the work. Including it provides the chance to take a strong stand. The role of a citation is to say that work exists. It won't help for citation counters, but anyone who reads our paper will see how the work is viewed.

Edit: This comment was constructed with feedback from @ctb and @strasser

davharris commented 7 years ago

I'd still rather cite one of the articles criticizing the analysis, e.g. 1 2 3 rather than the analysis itself.

cgreene commented 7 years ago

Article 3 in the list is really nicely done. This in particular is key:

I'm strongly supportive of citing that.

greenelab / deep-review

Discrimination + Machine Learning #302