Open karafecho opened 5 years ago
Question 1: ICEES functionality 4
Input parameters: features (TotalEDInpatientVisits <2 or >=2) version: 1.0.0 table: patient year: 2010 cohort_id: COHORT:22
Output (from full output): PM2.5 ozone medications socio-economic exposures cannot serve as input in downstream modules
Output includes counts of patients by bin, adjusted Chi Square Statistics, and P values
Exposures that significantly differ between the two groups of patients will be used as input for module five, but separate streams of operations will be maintained with annotation indicating which group was "higher" and which group was "lower".
Question 2: ICEES functionality 4
Input parameters: features (EstResidentialDensity <4 or >=4 OR <5 or >=5) To be decided or rebinned version: 1.0.0 table: patient year: 2010 cohort_id: COHORT:22
*The US Census Bureau identifies two types of urban areas: Urbanized Areas (UAs) of 50,000 or more people; Urban Clusters (UCs) of at least 2,500 and less than 50,000 people; and “Rural” or all population, housing, and territory not included within one of the two urban areas.
The ICEES patient population is largely rural, so the Census Bureau definitions may not work/apply with this use case.*
Output (from full output): PM2.5 ozone medications socio-economic features cannot serve as input in downstream modules
Output includes counts of patients by bin, adjusted Chi Square Statistics, and P values
Exposures that significantly differ between the two groups of patients will be used as input for module five, but separate streams of operations will be maintained with annotation indicating which group was "higher" and which group was "lower".
Note that Green/Gamma is working with the BioLink folks to develop high-level concepts for ICEES feature variables, in order to properly incorporate ICEES data into the BioLink data model.
US Census Bureau rural, urban definitions
The Census Bureau's urban-rural classification is fundamentally a delineation of geographical areas, identifying both individual urban areas and the rural areas of the nation. The Census Bureau's urban areas represent densely developed territory, and encompass residential, commercial, and other non-residential urban land uses. For the 2010 Census, an urban area will comprise a densely settled core of census tracts and/or census blocks that meet minimum population density requirements, along with adjacent territory containing non-residential urban land uses as well as territory with low population density included to link outlying densely settled territory with the densely settled core. To qualify as an urban area, the territory identified according to criteria must encompass at least 2,500 people, at least 1,500 of which reside outside institutional group quarters.
The Census Bureau identifies two types of urban areas:
Urbanized Areas (UAs) of 50,000 or more people; Urban Clusters (UCs) of at least 2,500 and less than 50,000 people. "Rural" encompasses all population, housing, and territory not included within an urban area.
The specific criteria used to define urban areas for the 2010 Census were published in the Federal Register of August 24, 2011.
@xu-hao : Let's bin EstResidentialDensity as defined above by the US Census Bureau.
cc-ing @diatomsRcool who is working on ECTO exposures
Thanks, Chris!
@balhoff @stevencox : Let's coordinate with @diatomsRcool and perhaps loop in Sarav and Alex Valencia (his student).
Do we need a meeting? I'm really not up to speed on translator stuff.
Don't worry, I think it's premature to have a meeting, at most an ECTO ticket request
Agreed! My intent was simply to make sure that we coordinate (and not duplicate) efforts.
Updated plan for implementation of Workflow 5:
Use functionality four in ICEES to stratify/cluster by TotalEDInpatientVisits (<2 vs >=2) and return chemical exposures that demonstrate a significant difference between the strata. The exposures will be airborne pollutants and medications. The output list will be passed to ROBOKOP for execution of queries in the form: "chemical substance -> gene -> biological process/activity -> phenotype".
Use functionality four in ICEES to stratify/cluster by EstResidentialDensity (1 [rural] vs 2 [urban]) and return chemical exposures that demonstrate a significant difference between the strata. The exposures will be airborne pollutants and medications. The output list will be passed to ROBOKOP for execution of queries in the form: "chemical substance -> gene -> biological process/activity -> phenotype".
Use functionality four in ICEES to stratify/cluster by Sex2 (Male vs Female) and return chemical exposures that demonstrate a significant difference between the strata. The exposures will be airborne pollutants and medications. The output list will be passed to ROBOKOP for execution of queries in the form: "chemical substance -> gene -> biological process/activity -> phenotype".
Use COHD to stratify/cluster by Sex. See COHD UI and a query template plus a specific instance of Workflow 5. Retrieve top 20 medications (based on frequency) for each sub-cohort. The output list will be passed to ROBOKOP for execution of queries in the form: "chemical substance -> gene -> biological process/activity -> phenotype".
Use Clinical Profiles to identify/create sub-cohorts of males and females with asthma. Retrieve top 20 medications (based on frequency) for each sub-cohort. The output list will be passed to ROBOKOP for execution of queries in the form: "chemical substance -> gene -> biological process/activity -> phenotype".
Note re ICEES:
We will need to capture directionality as part of the output for the workflow. By "directionality", I mean that we need to capture which strata is "enriched" for a given phenotype (i.e., has a higher percentage of patients with XXX). The Chi Square statistic that ICEES provides informs one of differences between groups or bins, but it does not provide any information on the directionality of the differences. Relative risks and odds ratios may suffice.
Notes re (1)-(5) above:
A. ICEES example query
Input:
Feature variables: AvgDailyPM2.5Exposures < 3, TotalEDInpatientVisits < 2 Version of data: 1.0.0 Table: patient Year: 2010 Cohort ID: COHORT:22
Output:*
+----------------------------+------------------------------+-------------------------------+---------+ | feature | TotalEDInpatientVisits < 2 | TotalEDInpatientVisits >= 2 | | +============================+==============================+===============================+=========+ | AvgDailyPM2.5Exposure < 3 | 297 91.10% | 29 8.90% | 326 | | | 5.85% 4.66% | 2.22% 0.45% | 5.11% | +----------------------------+------------------------------+-------------------------------+---------+ | AvgDailyPM2.5Exposure >= 3 | 4776 78.90% | 1277 21.10% | 6053 | | | 94.15% 74.87% | 97.78% 20.02% | 94.89% | +----------------------------+------------------------------+-------------------------------+---------+ | | 5073 | 1306 | 6379 | | | 79.53% | 20.47% | 100.00% | +----------------------------+------------------------------+-------------------------------+---------+ +-------------+---------------+ | p_value | chi_squared | +=============+===============+ | 3.16593e-06 | 28.2841 | +-------------+---------------+ *AvgDailyPM2.5Exposure <3 range: 1.58, 9.63 µg/m3; AvgDailyPM2.5Exposure >=3 range: 9.63, 17.33 µg/m3; TotalEDInpatientVisits = # emergency department or inpatient visits for a respiratory issue over a one-year ‘study’ period (the example here is for calendar year 2010).
B. COHD example queries
Input: Asthma (ID #317009) and Black or African American (ID #8516)
Output: { "concept_2_count": 208438, "concept_id_1": 317009, "concept_id_2": 8516, "concept_pair_count": 11716, "dataset_id": 2, "relative_frequency": 0.05620856081904451 }
Input: Asthma (ID #317009) and White (ID #8527)
Output: { "concept_2_count": 601167, "concept_id_1": 317009, "concept_id_2": 8527, "concept_pair_count": 29913, "dataset_id": 2, "relative_frequency": 0.049758220261591206 }
C. Clinical Profiles links
Hi Kara, just curiously is there any reason COHD only run implementation by Sex? instead of doing the same experiments as ICEES, then we can do comparison or cross validation afterwards? Is the plan proposed for Hackathon? Thanks! Qian
Hi Qian. The variables defined in (1) and (2) above are specific to ICEES and not available in COHD. (3) and (4) are intended to cross-validate output, as you noted. I'm hoping to do something similar for Green Team's Implementation of Workflow 4.
WRT the hackathon, I'm hoping that we can extend the plan above to include additional teams.
@stevencox @colinkcurtis @xu-hao : I'm wondering where we stand with (1) above, in terms of modules 1-4 and modules 5-8. I realize you all shifted your focus to (2), but I think (1) might serve best as a use case for SME evaluation (Dave Peden) during the hackathon. Plus, I'm developing a second ICEES manuscript that follows the first one and will focus on the outcome variable 'TotalEDInpatientVisits', so execution of (1) would align nicely with those efforts.
@karafecho I will pivot towards (1) again. In what I have been doing it was incidental that I began focusing on (2). I'll update when I have an executable CWL/Ros WF5 for (1). Tentatively, before Monday.
@webyrd @dkoslicki : Please take a look at the above Green/Gamma action plan for execution of Workflow 5, Modules 1-4, as well as the action plan for execution of Workflow 5, Modules 5-8 (#37). If you're interested, I'd be happy to discuss approaches for Alpha and X-Ray to contribute to this workflow.
See TranQL implementation of Workflow 5, which is related to Workflow 4, here.
WORKFLOW INPUT:
See ICEES_FeatureVariables and ICEES_Identifiers here for chemicals and medications. Note that these docs are updated as new variables are added to the ICEES integrated feature tables.
WORKFLOW (Gamma) QUESTION TEMPLATE:
{ "name": "Gamma WF5 template", "natural_question": "Chemical to gene to biological process/activity to phenotypic feature association.", "notes": "", "machine_question": { "nodes": [ { "id": "n0", "curie": "PUBCHEM:441335", "name": "Mometasone", "type": "chemical_substance" }, { "id": "n1", "type": "gene" }, { "id": "n2", "type": "biological_process_or_activity" }, { "id": "n3", "type": "phenotypic_feature" } ], "edges": [ { "id": "e0", "source_id": "n0", "target_id": "n1" }, { "id": "e1", "source_id": "n1", "target_id": "n2" }, { "id": "e2", "source_id": "n2", "target_id": "n3" } ] } }
ROBOKOP queries and RTX queries are being pre-computed for this workflow using all available ICEES chemicals and medications. Example ICEES queries are included below as an FYI:
curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"TotalEDInpatientVisits":{"operator":"<", "value":2}},"maximum_p_value":0.1}' -H "Accept: application/json"
curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"ur":{"operator":"=", "value":"U"}},"maximum_p_value":0.1}' -H "Accept: application/json"
curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"Sex2":{"operator":"=", "value":"Male"}},"maximum_p_value":0.1}' -H "Accept: application/json"
Green/Gamma initial plan is to refine end-to-end execution of WF5 using TranQL, with ICEES/COHD/Clinical Profiles for execution of modules 1-4 input and ROBOKOP/RTX/mediKanren for execution of modules 5-8.
Mini-hackathon was held on Friday, April 12, 12-4 pm ET. Topic: Unified Translator-compliant Clinical Knowledge Source API. Attendees: Hao Xu, Richard Zhu, Casey Ta, Steve Cos, and Kara Fecho. Event was successful. Team developed a plan of action and is moving forward with execution of the plan. The unified Translator Clinical Knowledge Source API will foster efforts on Workflows 4 and 5, as well as any efforts related to COHD, Clinical Profiles, and ICEES.
Scroll below to find updates to plan
Overview
Green/Gamma Team is approaching Workflow 5 using ICEES as the source of clinical data. In consideration of the design of ICEES, the team has decided to collapse Modules one through four of the workflow. In addition, a Jupyter notebook will be used to call ICEES and integrate with Gamma for subsequent modules.
Two questions will be asked:
This question will be fully evaluated by a SME (D. Peden) and serve as the basis of a TIDBIT.
This question will allow us to begin to more thoroughly explore the ACS data available through our Socioenvironmental Exposures API in the context of a workflow. In particular, the question will allow us to "stress test" our binning strategy.
Note that the output of modules one through four will be of the same entity type for both Question 1 and Question 2; thus, subsequent modules for workflow 5 will be identical for both questions.