Closed comorbidity closed 2 months ago
For the record, the motivation for F1 is especially useful when the "disease" or phenotype or "thing being measured" is rare. That's because F1 score scales with respect to prevalence, F1 is harmonic balance of Recall and PPV. https://pubmed.ncbi.nlm.nih.gov/15684123/
Thus far we have been using F1, and for good reason. We should also allow traditional Cohen's Kappa "k" statistic. https://en.wikipedia.org/wiki/Cohen%27s_kappa
Simple implementation is here https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html
The SciKitLearn implementation could replace prior placeholder, effectively does the same statistical calculation. https://github.com/smart-on-fhir/chart-review/blob/827dda819f899b0fec3e2e41677b1b842c0a3caf/chart_review/kappa.py#L4