Closed jaredroach closed 5 years ago
Green/Gamma can tackle the following using ICEES: "A perfectly fine output would be a list of phenotypes that are differentially enriched or impoverished in the sub-cohort compared to the full cohort". Caveat is that ICEES is currently restricted to a cohort of patients with asthma-like conditions. I think that's okay, however.
Possible approach: Given an ICEES user-selected cohort, apply domain-agnostic fingerprint distance metrics to detect clusters (part of Workflow 5) then report results consider enrichment only on ICEES-exposed features). One limitation with this is that sub-stratification is based on the entire EHR record, so it's possible that the enriched feature will be something other than the one exposed. The user might see no different in enrichment, or might see a difference, but not be able to interpret it without access to enrichment on the complete original records.
Please see the background for this problem in https://github.com/ncats/translator-workflows/issues/28. In short, we need to identify a set of clinical records and modules that allow us to output a "de-identified" dataset that include attributes from the whole-cohort and the sub-cohort. If considered to be adequately de-identified, a table consisting of rows of patient indices and columns of attributes would be a superb answer to this question. However, I suspect that may be asking for too much. A perfectly fine output would be a list of phenotypes that are differentially enriched or impoverished in the sub-cohort compared to the full cohort.
Any reasonable choice of cohort sub-cohort pair would be welcome. However for starters, I suggest one of these two: