higgood / med-jargon-explain-inator

Forking this so that we can associate tasks with the relevant repo. The ownership of this project belongs to all team members, and not to HIGG. HIGG is only sponsoring to facilitate project management.
2 stars 1 forks source link

Identify ~2 comprehensive medical jargon lists/databases #7

Open wammar opened 2 weeks ago

wammar commented 2 weeks ago

The MedlinePlus XML files I found give fewer than 2000 terms and might require a bit of data cleaning to remove parenthetical acronyms (e.g., alcohol use disorder (AUD)) and non-jargon terms (e.g., dog bites).

I think we should identify other better resources--or at least talk more clearly about our goals, especially our goal for lookup table size.

wammar commented 2 weeks ago

Bridger's proposal: Make a list of 10 jargon terms that you definitely want to see covered, and 10 medical but not non-jargon terms that you definitely don't want to see covered, then find a resource that maximizes # of test jargon terms - # of test non-jargon terms.

Example: Resource A covers 7 of the jargon terms, and 3 of the non-jargon terms, then the score = 7 - 3 = 4.