Open wammar opened 6 months ago
Bridger's proposal: Make a list of 10 jargon terms that you definitely want to see covered, and 10 medical but not non-jargon terms that you definitely don't want to see covered, then find a resource that maximizes # of test jargon terms - # of test non-jargon terms.
Example: Resource A covers 7 of the jargon terms, and 3 of the non-jargon terms, then the score = 7 - 3 = 4.
The MedlinePlus XML files I found give fewer than 2000 terms and might require a bit of data cleaning to remove parenthetical acronyms (e.g., alcohol use disorder (AUD)) and non-jargon terms (e.g., dog bites).
I think we should identify other better resources--or at least talk more clearly about our goals, especially our goal for lookup table size.