ncats / translator-workflows

12 stars 6 forks source link

Workflow 4: Pick a good collection of EHRs or individual clinical records as input for Workflow 4 #30

Closed jaredroach closed 5 years ago

jaredroach commented 5 years ago

Please see the background for this problem in https://github.com/ncats/translator-workflows/issues/28. In short, we need to identify a set of clinical records and modules that allow us to output a "de-identified" dataset that include attributes from the whole-cohort and the sub-cohort. If considered to be adequately de-identified, a table consisting of rows of patient indices and columns of attributes would be a superb answer to this question. However, I suspect that may be asking for too much. A perfectly fine output would be a list of phenotypes that are differentially enriched or impoverished in the sub-cohort compared to the full cohort.

Any reasonable choice of cohort sub-cohort pair would be welcome. However for starters, I suggest one of these two:

  1. Coma / hypoglycemic coma
  2. Hypertension / pheochromocytoma
karafecho commented 5 years ago

Green/Gamma can tackle the following using ICEES: "A perfectly fine output would be a list of phenotypes that are differentially enriched or impoverished in the sub-cohort compared to the full cohort". Caveat is that ICEES is currently restricted to a cohort of patients with asthma-like conditions. I think that's okay, however.

jh111 commented 5 years ago

Possible approach: Given an ICEES user-selected cohort, apply domain-agnostic fingerprint distance metrics to detect clusters (part of Workflow 5) then report results consider enrichment only on ICEES-exposed features). One limitation with this is that sub-stratification is based on the entire EHR record, so it's possible that the enriched feature will be something other than the one exposed. The user might see no different in enrichment, or might see a difference, but not be able to interpret it without access to enrichment on the complete original records.