ncats / translator-workflows

12 stars 6 forks source link

Workflow 4: Green/Gamma Implementation of Modules 1-4 & 5-8 #44

Open karafecho opened 5 years ago

karafecho commented 5 years ago

This issue relates to Green/Gamma's efforts on Workflow 4.

karafecho commented 5 years ago

Scroll down for updates to plan

Plan for implementation of Workflow 4:

  1. Use functionality four in ICEES to stratify/cluster by Sex2 (Male vs Female) and return phenotypes that demonstrate a significant difference between the strata. The phenotypes will be diagnoses and certain demographic variables. The output list will be passed to ROBOKOP for execution of queries in the form: "disease or phenotypic feature -> gene -> biological process/activity -> chemical substance" and/or "disease or phenotypic feature -> gene -> biological process/activity -> gene -> drug".

Note that other paths are possible and may be attempted.

  1. Use COHD to stratify/cluster by Sex. See COHD UI and a query template plus a specific instance of Workflow 5. Retrieve top 20 diagnoses (based on frequency) for each sub-cohort. The output list will be passed to ROBOKOP for execution of queries in the form: "disease -> gene -> biological process/activity -> chemical substance".

  2. Use Clinical Profiles to identify/create sub-cohorts of males and females with asthma. Retrieve top 20 diagnoses (based on frequency) for each sub-cohort. The output list will be passed to ROBOKOP for execution of queries in the form: "disease -> gene -> biological process/activity -> phenotype".

Note re ICEES: We will need to capture directionality as part of the output for the workflow. By "directionality", I mean that we need to capture which strata is "enriched" for a given phenotype (i.e., has a higher percentage of patients with XXX). The Chi Square statistic that ICEES provides informs one of differences between groups or bins, but it does not provide any information on the directionality of the differences. Relative risks and odds ratios may suffice.

A. ICEES example query

Input:

Feature variables: AvgDailyPM2.5Exposures < 3, TotalEDInpatientVisits < 2 Version of data: 1.0.0 Table: patient Year: 2010 Cohort ID: COHORT:22

Output:*

+----------------------------+------------------------------+-------------------------------+---------+ | feature | TotalEDInpatientVisits < 2 | TotalEDInpatientVisits >= 2 | | +============================+==============================+===============================+=========+ | AvgDailyPM2.5Exposure < 3 | 297 91.10% | 29 8.90% | 326 | | | 5.85% 4.66% | 2.22% 0.45% | 5.11% | +----------------------------+------------------------------+-------------------------------+---------+ | AvgDailyPM2.5Exposure >= 3 | 4776 78.90% | 1277 21.10% | 6053 | | | 94.15% 74.87% | 97.78% 20.02% | 94.89% | +----------------------------+------------------------------+-------------------------------+---------+ | | 5073 | 1306 | 6379 | | | 79.53% | 20.47% | 100.00% | +----------------------------+------------------------------+-------------------------------+---------+ +-------------+---------------+ | p_value | chi_squared | +=============+===============+ | 3.16593e-06 | 28.2841 | +-------------+---------------+ *AvgDailyPM2.5Exposure <3 range: 1.58, 9.63 µg/m3; AvgDailyPM2.5Exposure >=3 range: 9.63, 17.33 µg/m3; TotalEDInpatientVisits = # emergency department or inpatient visits for a respiratory issue over a one-year ‘study’ period (the example here is for calendar year 2010).

B. COHD example queries

Input: Asthma (ID #317009) and Black or African American (ID #8516)

Output: { "concept_2_count": 208438, "concept_id_1": 317009, "concept_id_2": 8516, "concept_pair_count": 11716, "dataset_id": 2, "relative_frequency": 0.05620856081904451 }

Input: Asthma (ID #317009) and White (ID #8527)

Output: { "concept_2_count": 601167, "concept_id_1": 317009, "concept_id_2": 8527, "concept_pair_count": 29913, "dataset_id": 2, "relative_frequency": 0.049758220261591206 }

C. Clinical Profiles links

HAPI-FHIR

Custom Translator JHU Clinical Profiles Build

karafecho commented 5 years ago

See Green/Gamma TranQL implementation of Workflow 5, which is related to Workflow 4, here.

karafecho commented 5 years ago

WORKFLOW INPUT:

See ICEES_FeatureVariables and ICEES_Identifiers here for diagnoses. Note that these docs are updated as new variables are added to the ICEES integrated feature tables.

WORKFLOW (Gamma) QUESTION TEMPLATE:

Note that the second gene hop was added per ROBOKOP Neo4J constraints. If we can avoid this, great; if not, that's fine, too.

{ "name": "Gamma WF4 template", "natural_question": "disease or phenotypic feature to gene to biological process/activity to gene to drug", "notes": "", "machine_question": { "nodes": [ { "id": "n0", "curie": "MONDO:0008300", "name": "ObesityDx", "type": "disease or phenotypic feature" }, { "id": "n1", "type": "gene" }, { "id": "n2", "type": "biological_process_or_activity" }, { "id": "n3", "type": "gene" }, { "id": "n4", "type": "drug" } ], "edges": [ { "id": "e0", "source_id": "n0", "target_id": "n1" }, { "id": "e1", "source_id": "n1", "target_id": "n2" }, { "id": "e2", "source_id": "n2", "target_id": "n3" } ] } }

karafecho commented 5 years ago

ROBOKOP queries and RTX queries are being pre-computed for this workflow using all available ICEES phenotypes/diagnoses. Example ICEES queries are included below as an FYI:

curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"TotalEDInpatientVisits":{"operator":"<", "value":2}},"maximum_p_value":0.1}' -H "Accept: application/json"

curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"ur":{"operator":"=", "value":"U"}},"maximum_p_value":0.1}' -H "Accept: application/json"

curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"Sex2":{"operator":"=", "value":"Male"}},"maximum_p_value":0.1}' -H "Accept: application/json"

karafecho commented 5 years ago

Green/Gamma initial plan is to refine end-to-end execution of WF4 using TranQL, with ICEES/COHD/Clinical Profiles for execution of modules 1-4 input and ROBOKOP/RTX/mediKanren for execution of modules 5-8.

karafecho commented 5 years ago

Mini-hackathon was held on Friday, April 12, 12-4 pm ET. Topic: Unified Translator-compliant Clinical Knowledge Source API. Attendees: Hao Xu, Richard Zhu, Casey Ta, Steve Cos, and Kara Fecho. Event was successful. Team developed a plan of action and is moving forward with execution of the plan. The unified Translator Clinical Knowledge Source API will foster efforts on Workflows 4 and 5, as well as any efforts related to COHD, Clinical Profiles, and ICEES.