som-shahlab / ehr_ml

Code for doing machine learning with various EHRs
MIT License
21 stars 3 forks source link

More example usage of the many labeling helper functions #15

Open birjuspatel opened 3 years ago

birjuspatel commented 3 years ago

In the spirit of CLMBR becoming the de facto way of doing rich representation learning on EHR data at Stanford, I think we'll need to provide a bit more guidance on how to use all the labeling helper functions for ehr_ml (https://github.com/som-shahlab/ehr_ml/blob/master/ehr_ml/labeler.py). Apologies if you've already done that somewhere and I missed it

For example, following tutorial 3a, I was curious how you would find all the descendants of the diabetes ICD codes. I saw you did something similar with the OpioidOverdoseLabeler. Perhaps you might just direct everyone to read the source code for now until you figure out how much effort you want to give to writing up more API documentation or generating more examples

Another labeling task that I'm trying to wrap my head around is how to set a cohort not by a time delta from the label, but from some clinically relevant timepoint. For example, I see helper functions for fixed and infinite time horizons, and 1 year history of data, but these appear centered on the date of the label. How does one build timelines where the prediction date is set to some clinical concept (date of admission to the hospital, or date of first appointment with oncologist after diagnosis of cancer) -- thus, the timeline must recognize that only data prior to that date should go into the extracted embeddings? I gather tutorial 3b does this manually by providing the date offsets.

Perhaps this is a matter of vocabulary -- labeling functions could be used to generate the cohort start/end times (e.g., set hospital admission as the label, even though it's truly the prediction time), and then you can override them with your actual label of interest (e.g., operative procedure during hospital stay) when you build your model