som-shahlab / ehr_ml

Code for doing machine learning with various EHRs
MIT License
21 stars 3 forks source link

Siyun/string description #26

Closed Siyun96 closed 2 years ago

Siyun96 commented 2 years ago

Implementation for issue #24.

For each code, we look up its text description in MRCONSO.RRF (STR column). Since there might be multiple entries for the same code, we only keep the shortest string (with at least 4 characters) that satisfy the query condition: "SUPPRESS" NOT IN ('O', 'Y') AND ISPREF='Y' AND LAT='ENG'. For code with no description, they are mapped to NO_DEF. The mapping is stored as an OntologyCodeDictionary in ontology.db.

create_timelines.py shows an example of how to query per-patient timeline object with text descriptions.

test_patient.json shows an example output from a SynPUF patient.