smart-on-fhir / chart-review

Measure agreement between chart reviewers.
https://docs.smarthealthit.org/cumulus/chart-review/
Apache License 2.0
1 stars 0 forks source link

Implement Cohen's Kappa for "agreement" between two reviewers #27

Closed comorbidity closed 2 months ago

comorbidity commented 2 months ago

Thus far we have been using F1, and for good reason. We should also allow traditional Cohen's Kappa "k" statistic. https://en.wikipedia.org/wiki/Cohen%27s_kappa

Simple implementation is here https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html

The SciKitLearn implementation could replace prior placeholder, effectively does the same statistical calculation. https://github.com/smart-on-fhir/chart-review/blob/827dda819f899b0fec3e2e41677b1b842c0a3caf/chart_review/kappa.py#L4

comorbidity commented 2 months ago

For the record, the motivation for F1 is especially useful when the "disease" or phenotype or "thing being measured" is rare. That's because F1 score scales with respect to prevalence, F1 is harmonic balance of Recall and PPV. https://pubmed.ncbi.nlm.nih.gov/15684123/