Closed bschilder closed 5 months ago
I've fixed this by changing search_hpo
:
https://github.com/neurogenomics/HPOExplorer/blob/0e23969bb11c02679c0267369b1463bff6be22ee/R/search_hpo.R#L42
It now searches for:
"Cancer",
"malignant",
"carcinoma"
This really helps to narrow down from "Neoplasms" which was too broad and included lots of non-malignant growths. As a consequence, the true positive rate for cancer is now >96%.
checks$true_pos_rate
When we ask chatGPT whether something causes cancer, we're asking about malignant neoplasms. Technically speaking, cancers are always malignant. My function
HPOExplorer::search_hpo
function considers both malignant and benign neoplasms, thus the lower recall score for the "cancer" annotations. @NathanSkene said @KittyMurphy had done some analyses to show this was indeed the reason for the low recall. Could you point me towards these @KittyMurphy ?If so, I can adjust
HPOExplorer::search_hpo
to be more specific, thus improving our recall.