Open karafecho opened 5 years ago
Scroll down for updates to plan
Plan for implementation of Workflow 4:
Note that other paths are possible and may be attempted.
Use COHD to stratify/cluster by Sex. See COHD UI and a query template plus a specific instance of Workflow 5. Retrieve top 20 diagnoses (based on frequency) for each sub-cohort. The output list will be passed to ROBOKOP for execution of queries in the form: "disease -> gene -> biological process/activity -> chemical substance".
Use Clinical Profiles to identify/create sub-cohorts of males and females with asthma. Retrieve top 20 diagnoses (based on frequency) for each sub-cohort. The output list will be passed to ROBOKOP for execution of queries in the form: "disease -> gene -> biological process/activity -> phenotype".
Note re ICEES: We will need to capture directionality as part of the output for the workflow. By "directionality", I mean that we need to capture which strata is "enriched" for a given phenotype (i.e., has a higher percentage of patients with XXX). The Chi Square statistic that ICEES provides informs one of differences between groups or bins, but it does not provide any information on the directionality of the differences. Relative risks and odds ratios may suffice.
A. ICEES example query
Input:
Feature variables: AvgDailyPM2.5Exposures < 3, TotalEDInpatientVisits < 2 Version of data: 1.0.0 Table: patient Year: 2010 Cohort ID: COHORT:22
Output:*
+----------------------------+------------------------------+-------------------------------+---------+ | feature | TotalEDInpatientVisits < 2 | TotalEDInpatientVisits >= 2 | | +============================+==============================+===============================+=========+ | AvgDailyPM2.5Exposure < 3 | 297 91.10% | 29 8.90% | 326 | | | 5.85% 4.66% | 2.22% 0.45% | 5.11% | +----------------------------+------------------------------+-------------------------------+---------+ | AvgDailyPM2.5Exposure >= 3 | 4776 78.90% | 1277 21.10% | 6053 | | | 94.15% 74.87% | 97.78% 20.02% | 94.89% | +----------------------------+------------------------------+-------------------------------+---------+ | | 5073 | 1306 | 6379 | | | 79.53% | 20.47% | 100.00% | +----------------------------+------------------------------+-------------------------------+---------+ +-------------+---------------+ | p_value | chi_squared | +=============+===============+ | 3.16593e-06 | 28.2841 | +-------------+---------------+ *AvgDailyPM2.5Exposure <3 range: 1.58, 9.63 µg/m3; AvgDailyPM2.5Exposure >=3 range: 9.63, 17.33 µg/m3; TotalEDInpatientVisits = # emergency department or inpatient visits for a respiratory issue over a one-year ‘study’ period (the example here is for calendar year 2010).
B. COHD example queries
Input: Asthma (ID #317009) and Black or African American (ID #8516)
Output: { "concept_2_count": 208438, "concept_id_1": 317009, "concept_id_2": 8516, "concept_pair_count": 11716, "dataset_id": 2, "relative_frequency": 0.05620856081904451 }
Input: Asthma (ID #317009) and White (ID #8527)
Output: { "concept_2_count": 601167, "concept_id_1": 317009, "concept_id_2": 8527, "concept_pair_count": 29913, "dataset_id": 2, "relative_frequency": 0.049758220261591206 }
C. Clinical Profiles links
See Green/Gamma TranQL implementation of Workflow 5, which is related to Workflow 4, here.
WORKFLOW INPUT:
See ICEES_FeatureVariables and ICEES_Identifiers here for diagnoses. Note that these docs are updated as new variables are added to the ICEES integrated feature tables.
WORKFLOW (Gamma) QUESTION TEMPLATE:
Note that the second gene hop was added per ROBOKOP Neo4J constraints. If we can avoid this, great; if not, that's fine, too.
{ "name": "Gamma WF4 template", "natural_question": "disease or phenotypic feature to gene to biological process/activity to gene to drug", "notes": "", "machine_question": { "nodes": [ { "id": "n0", "curie": "MONDO:0008300", "name": "ObesityDx", "type": "disease or phenotypic feature" }, { "id": "n1", "type": "gene" }, { "id": "n2", "type": "biological_process_or_activity" }, { "id": "n3", "type": "gene" }, { "id": "n4", "type": "drug" } ], "edges": [ { "id": "e0", "source_id": "n0", "target_id": "n1" }, { "id": "e1", "source_id": "n1", "target_id": "n2" }, { "id": "e2", "source_id": "n2", "target_id": "n3" } ] } }
ROBOKOP queries and RTX queries are being pre-computed for this workflow using all available ICEES phenotypes/diagnoses. Example ICEES queries are included below as an FYI:
curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"TotalEDInpatientVisits":{"operator":"<", "value":2}},"maximum_p_value":0.1}' -H "Accept: application/json"
curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"ur":{"operator":"=", "value":"U"}},"maximum_p_value":0.1}' -H "Accept: application/json"
curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"Sex2":{"operator":"=", "value":"Male"}},"maximum_p_value":0.1}' -H "Accept: application/json"
Green/Gamma initial plan is to refine end-to-end execution of WF4 using TranQL, with ICEES/COHD/Clinical Profiles for execution of modules 1-4 input and ROBOKOP/RTX/mediKanren for execution of modules 5-8.
Mini-hackathon was held on Friday, April 12, 12-4 pm ET. Topic: Unified Translator-compliant Clinical Knowledge Source API. Attendees: Hao Xu, Richard Zhu, Casey Ta, Steve Cos, and Kara Fecho. Event was successful. Team developed a plan of action and is moving forward with execution of the plan. The unified Translator Clinical Knowledge Source API will foster efforts on Workflows 4 and 5, as well as any efforts related to COHD, Clinical Profiles, and ICEES.
This issue relates to Green/Gamma's efforts on Workflow 4.