zietzm / phenotype_standardization

https://github.com/zietzm/phenotype_standardization/wiki
4 stars 0 forks source link

Standardized phenotype definitions for COVID-19 analyses #1

Closed zietzm closed 4 years ago

zietzm commented 4 years ago

I've started using direct SQL queries with a custom function.

sql_to_tibble <- function(sql_code, .con = con) {
    sql_code %>%
        str_glue %>%
        sql %>%
        tbl(src = .con) %>%
        as_tibble
}

For example:

obese_by_bmi <- '
    SELECT DISTINCT pat_mrn_id
    FROM 1_covid_vitals_noname
    WHERE bmi > 30 AND date_retrieved="{date}"
    ' %>%
    sql_to_tibble

While this makes things easier in general, just remember that when you want to use the { or } characters (which are common in descriptions in the Epic tables), just double the character (eg. {{NYP}}).

zietzm commented 4 years ago

SARS-CoV-2 infection test result

Many people have multiple tests. The goal is to find their first positive test if they had a positive test. This finds first the maximum infection value, then the minimum date for that patient and infection value.

SELECT pat_mrn_id, MIN(result_date) AS covid_result_date, MAX(cov_pos) AS cov_pos
FROM (
    SELECT pat_mrn_id, result_date, CAST(ord_value LIKE "Detected%" AS DECIMAL(2, 0)) AS cov_pos
    FROM 1_covid_labs_noname
    WHERE date_retrieved="{date}" AND 
        ord_value NOT IN ("Invalid", "Indeterminate", "Nasopharyngeal", "Not Given", 
                          "Void", "See Comment", "Yes")
) AS all_test_patients
INNER JOIN 
(
    SELECT pat_mrn_id, MAX(CAST(ord_value LIKE "Detected%" AS DECIMAL(2, 0))) AS cov_pos
    FROM 1_covid_labs_noname
    WHERE date_retrieved="{date}"
    GROUP BY pat_mrn_id
) AS any_test_positive USING (pat_mrn_id, cov_pos)
GROUP BY pat_mrn_id

COVID-19 diagnosis

SELECT DISTINCT pat_mrn_id, 1 AS cov_diagnosed, MIN(DATE(contact_date_string)) AS covid_diagnosis_date
FROM 1_covid_patients_noname
WHERE covid_diagnosis="Y" AND date_retrieved="{date}"
GROUP BY pat_mrn_id
zietzm commented 4 years ago

Diabetes mellitus

Defined using descendants of OMOP 201820, ICD 10 CM E08-E13, or HbA1c >= 6.5%.

HbA1c measured using any of the following:

  1. LOINC 4548-4 (Hemoglobin A1c/Hemoglobin.total in Blood)
  2. LOINC 17856-6 (Hemoglobin A1c/Hemoglobin.total in Blood by HPLC)
  3. LOINC 55454-3 (Deprecated Hemoglobin A1c in Blood)

Query ran in 35 seconds on 2020-06-09

SELECT DISTINCT pat_mrn_id
FROM (
    # From OMOP table (SNOMED)
    SELECT DISTINCT pat_mrn_id
    FROM concept_ancestor 
    INNER JOIN condition_occurrence ON descendant_concept_id = condition_concept_id
    INNER JOIN 1_covid_patient2person USING (person_id)
    WHERE ancestor_concept_id = 201820

    UNION ALL

    # From COVID table (ICD10)
    SELECT DISTINCT pat_mrn_id
    FROM 1_covid_patients_noname
    WHERE date_retrieved = "{date}" AND (
        icd10_code LIKE "E08%" OR icd10_code LIKE "E09%" OR 
        icd10_code LIKE "E10%" OR icd10_code LIKE "E11%" OR icd10_code LIKE "E13%"
    )

    UNION ALL 

    # From OMOP table (HbA1c >= 6.5)
    SELECT DISTINCT pat_mrn_id
    FROM measurement 
    INNER JOIN 1_covid_patient2person USING (person_id)
    WHERE measurement_concept_id IN (3004410, 3005673, 40758583) AND 
        CAST(REPLACE(REPLACE(REPLACE(value_source_value, "<", ""), ">", ""), "%", "") 
             AS DECIMAL(4, 3)) >= 6.5

    UNION ALL

    # From COVID table (HbA1c >= 6.5)
    SELECT DISTINCT component_loinc_code
    FROM 1_covid_measurements_noname
    WHERE date_retrieved = "{date}" AND component_loinc_code IN (4548, 17856) AND
        CAST(REPLACE(REPLACE(REPLACE(ord_value, "<", ""), ">", ""), "%", "") 
             AS DECIMAL(4, 3)) >= 6.5
) AS dm_patients

In the future, we could consider adding (fasting) glucose measurements also.

zietzm commented 4 years ago

Obesity

BMI >= 30, descendants of OMOP 4215968, BMI percentile >= 95%. Measurements 2019 or later.

SELECT DISTINCT pat_mrn_id 
FROM (
    # From COVID table (BMI > 30)
    SELECT DISTINCT pat_mrn_id
    FROM 1_covid_vitals_noname
    WHERE bmi >= 30 AND date_retrieved="{date}"

    UNION ALL

    # From OMOP table (BMI > 30)
    SELECT DISTINCT pat_mrn_id
    FROM measurement
    INNER JOIN 1_covid_patient2person USING (person_id)
    WHERE measurement_concept_id = 3038553 AND measurement_date > "2019-01-01" AND 
        value_as_number >= 30

    UNION ALL

    # From OMOP table (diagnosis code descendant of 4215968)
    SELECT DISTINCT pat_mrn_id
    FROM observation 
    INNER JOIN concept_ancestor ON observation_concept_id = descendant_concept_id
    INNER JOIN 1_covid_patient2person USING (person_id)
    WHERE ancestor_concept_id = 4215968

    UNION ALL

    # From OMOP table (BMI percentile > 95%)
    SELECT DISTINCT pat_mrn_id
    FROM measurement
    INNER JOIN 1_covid_patient2person USING (person_id)
    WHERE measurement_concept_id = 40762636 AND measurement_date > "2019-01-01" AND 
        value_as_number >= 95 
) AS ob_patients
zietzm commented 4 years ago

Proteinuria

Descendants of OMOP 75650 or protein urine tests with results: >14 mg/dl for random/point-of-care urine test or >150 mg/dl for 24 tests where the reference value is <= 150 mg/dl or results listed as 1+, 2+, or 3+ on a dipstick test

SELECT DISTINCT pat_mrn_id
FROM (
    SELECT DISTINCT pat_mrn_id
    FROM condition_occurrence
    INNER JOIN concept_ancestor ON condition_concept_id = descendant_concept_id
    INNER JOIN 1_covid_patient2person USING (person_id)
    WHERE ancestor_concept_id = 75650

    UNION ALL

    SELECT DISTINCT pat_mrn_id
    FROM 1_covid_measurements_noname
    WHERE date_retrieved = "{date}" AND (
        # Dipstick urine protein test
        (component_id = 22285 AND ord_value IN ("1+", "2+", "3+")) OR
        # Point-of-care protein urine test strip
        (component_id = 58040 AND ord_value IN ("30", "100", "300", ">300", ">=300", ">=500", ">=1000")) OR
        # Random urine protein test (> 14 mg/dl)
        (component_id = 28886 AND CAST(REPLACE(REPLACE(ord_value, "<", ""), ">", "") AS DECIMAL(10, 5)) > 14) OR
        # 24 hour urine protein (> 150 mg/24hr is the listed reference)
        (component_id = 28894 AND CAST(ord_value AS DECIMAL(10, 5)) > 150) OR
        # Protein/creatinine ratio (reference is <=0.2 mg/mg or <=200 mg/g)
        (component_loinc_code = 2890 AND component_name != "PROTEIN TOTAL, URINE {{EXT}}" AND 
         ord_value NOT IN ("NCAL", "NOTE", "Cannot Calculate") AND (
             (reference_unit = "mg/g creat" AND ord_num_value > 200) OR
             (reference_unit = "mg/mg creat" AND CAST(ord_value AS DECIMAL(5, 5)) > 0.2)
         )
        )
    )
) AS proteinuria_patients
zietzm commented 4 years ago

Albinuria

(micro or macro) >= 2 elevated measurements of one of the following:

  1. LOINC 14956-7 (Microalbumin [Mass/time] in 24 hour Urine)
  2. LOINC 14957-5 (Microalbumin [Mass/volume] in Urine)
  3. LOINC 14958-3 (Microalbumin/Creatinine [Mass Ratio] in 24 hour Urine)

Reference ranges provided in the table.

SELECT DISTINCT pat_mrn_id
FROM 1_covid_measurements_noname
WHERE date_retrieved = "{date}" AND component_loinc_code IN (14956, 14957, 14958) AND (
    (reference_unit IN ("mg/dL", "mg/24hr", "mcg/mg creat") AND CAST(REPLACE(REPLACE(REPLACE(ord_value, ">", ""), "<", ""), "=", "") AS DECIMAL(10, 5)) > 30) OR
    (reference_unit = "mg/L" AND CAST(REPLACE(REPLACE(REPLACE(ord_value, ">", ""), "<", ""), "=", "") AS DECIMAL(10, 5)) > 300)
)
GROUP BY pat_mrn_id
HAVING COUNT(DISTINCT result_date) >= 2

Reduced eGFR

One abnormally high estimate of one of the following:

  1. LOINC 48642-3 (Glomerular filtration rate/1.73 sq M.predicted among non-blacks [Volume Rate/Area] in Serum, Plasma or Blood by Creatinine-based formula (MDRD))
  2. LOINC 48643-1 (Glomerular filtration rate/1.73 sq M.predicted among blacks [Volume Rate/Area] in Serum, Plasma or Blood by Creatinine-based formula (MDRD))
  3. LOINC 88294-4 (Glomerular filtration rate/1.73 sq M.predicted among non-blacks [Volume Rate/Area] in Serum, Plasma or Blood by Creatinine-based formula (CKD-EPI))
  4. LOINC 50210-4 (Glomerular filtration rate/1.73 sq M.predicted [Volume Rate/Area] in Serum, Plasma or Blood by Cystatin-based formula)

All GFR estimates used below have >= 60 as the reference range.

SELECT DISTINCT pat_mrn_id
FROM 1_covid_measurements_noname
WHERE date_retrieved = "{date}" AND ord_value REGEXP "^[0-9\\.]+$" AND
    (component_loinc_code IN (48642, 48643, 88294, 50210) OR component_id = 10237) AND
    ord_num_value < 60

Elevated plasma creatinine

>= 2 elevated (wrt adult and sex reference) measurements of

Abnormal for males >= 1.3 mg/dL, females >= 1.2 mg/dL (conservatively since reference for age < 60 females is 0.6 - 1.1 mg/dL)

Source for reference ranges https://labtestsonline.org/tests/creatinine

SELECT DISTINCT pat_mrn_id
FROM 1_covid_measurements_noname
INNER JOIN 1_covid_persons_noname USING (pat_mrn_id)
WHERE date_retrieved = "{date}" AND component_loinc_code = 2160 AND 
    ord_value REGEXP "^[0-9\\.]+$" AND (
    (sex_desc = "Male" AND CAST(ord_value AS DECIMAL(10, 5)) > 1.3) OR 
    (sex_desc = "Female" AND CAST(ord_value AS DECIMAL(10, 5)) > 1.2)
)
GROUP BY pat_mrn_id
HAVING COUNT(DISTINCT result_date) >= 2

Diabetic nephropathy

Using the definitions above, define diabetic nephropathy as both albinuria and one of reduced eGFR or elevated plasma creatinine. Source https://doi.org/10.1038/nrendo.2013.184

Alternatively, diagnosis of descendants of SNOMED CT 127013003 (Disorder of kidney due to diabetes mellitus) or one of the following:

  1. ICD-10 CM E08.21 (Diabetes mellitus due to underlying condition with diabetic nephropathy)
  2. ICD-10 CM E09.21 (Drug or chemical induced diabetes mellitus with diabetic nephropathy)
  3. ICD-10 CM E10.21 (Type 1 diabetes mellitus with diabetic nephropathy)
  4. ICD-10 CM E11.21 (Type 2 diabetes mellitus with diabetic nephropathy)
  5. ICD-10 CM E13.21 (Other specified diabetes mellitus with diabetic nephropathy)
SELECT DISTINCT pat_mrn_id 
FROM (
    # Albuminuria
    SELECT DISTINCT pat_mrn_id 
    FROM (
        SELECT DISTINCT pat_mrn_id
        FROM 1_covid_measurements_noname
        WHERE date_retrieved = "{date}" AND component_loinc_code IN (14957, 14956, 14958) AND (
            (reference_unit IN ("mg/dL", "mg/24hr", "mcg/mg creat") AND CAST(REPLACE(REPLACE(REPLACE(ord_value, ">", ""), "<", ""), "=", "") AS DECIMAL(10, 5)) > 30) OR
            (reference_unit = "mg/L" AND CAST(REPLACE(REPLACE(REPLACE(ord_value, ">", ""), "<", ""), "=", "") AS DECIMAL(10, 5)) > 300)
        )
        GROUP BY pat_mrn_id
        HAVING COUNT(DISTINCT result_date) >= 2
    ) AS albuminuria_patients
    INNER JOIN (
        # Increased plasma creatinine
        SELECT DISTINCT pat_mrn_id
        FROM 1_covid_measurements_noname
        INNER JOIN 1_covid_persons_noname USING (pat_mrn_id)
        WHERE date_retrieved = "{date}" AND component_loinc_code = 2160 AND 
            ord_value REGEXP "^[0-9\\.]+$" AND (
            (sex_desc = "Male" AND CAST(ord_value AS DECIMAL(10, 5)) > 1.3) OR 
            (sex_desc = "Female" AND CAST(ord_value AS DECIMAL(10, 5)) > 1.2)
        )
        GROUP BY pat_mrn_id
        HAVING COUNT(DISTINCT result_date) >= 2

        UNION ALL

        # Reduced eGFR
        SELECT DISTINCT pat_mrn_id
        FROM 1_covid_measurements_noname
        WHERE date_retrieved = "{date}" AND ord_value REGEXP "^[0-9\\.]+$" AND
            (component_loinc_code IN (48642, 48643, 88294, 50210) OR component_id = 10237) AND
            ord_num_value < 60
    ) AS plasma_creatinine_egfr_patients USING (pat_mrn_id)

    UNION ALL 

    # Diagnosis codes of nephropathy
    SELECT DISTINCT pat_mrn_id
    FROM 1_covid_patients_noname 
    WHERE date_retrieved = "{date}" AND 
        REPLACE(icd10_code, ",", "") IN ("E08.21", "E09.21", "E10.21", "E11.21", "E13.21")

    UNION ALL

    # From OMOP table (descendants of 192279)
    SELECT DISTINCT pat_mrn_id
    FROM condition_occurrence
    INNER JOIN concept_ancestor ON condition_concept_id = descendant_concept_id
    INNER JOIN 1_covid_patient2person using (person_id)
    WHERE ancestor_concept_id = 192279
) as diabetic_nephropathy_patients
zietzm commented 4 years ago

Diabetic retinopathy

Using a single diagnosis of any descendant of SNOMED CT 4855003 (Retinopathy due to diabetes mellitus) or ICD-10 CM terms containing "retinopathy" but not "without retinopathy" within the ICD-10 CM diabetes mellitus range (E08, E09, E10, E11, E13).

Note: there are a lot of ICD-10 CM codes describing diabetic retinopathy (93 used in the COVID table alone as of June 10), so I chose not to list them explicitly.

SELECT DISTINCT pat_mrn_id
FROM (
    SELECT DISTINCT pat_mrn_id
    FROM 1_covid_patients_noname 
    WHERE date_retrieved = "{date}" AND icd10_desc LIKE "%retinopathy%" AND 
        icd10_desc NOT LIKE "%without retinopathy%" AND (
            icd10_code LIKE "E08%" OR icd10_code LIKE "E09%" OR 
            icd10_code LIKE "E10%" OR icd10_code LIKE "E11%" OR 
            icd10_code LIKE "E13%"
        )

    UNION ALL

    SELECT DISTINCT pat_mrn_id
    FROM condition_occurrence
    INNER JOIN concept_ancestor ON condition_concept_id = descendant_concept_id
    INNER JOIN 1_covid_patient2person using (person_id)
    WHERE ancestor_concept_id = 4174977
) AS diabetic_retinopathy_patients
zietzm commented 4 years ago

Diabetic neuropathy

Using a single diagnosis of any descendant of SNOMED CT 422088007 (Disorder of nervous system due to diabetes mellitus) or ICD-10 CM terms containing "neuropathy" within the ICD-10 CM diabetes mellitus range (E08, E09, E10, E11, E13).

Note: there are a lot of ICD-10 CM codes describing diabetic neuropathy (22 used in the COVID table alone as of June 10), so I chose not to list them explicitly.

SELECT DISTINCT pat_mrn_id
FROM (
    SELECT DISTINCT pat_mrn_id
    FROM 1_covid_patients_noname 
    WHERE date_retrieved = "{date}" AND icd10_desc LIKE "%neuropathy%" AND
        (icd10_code LIKE "E08%" OR icd10_code LIKE "E09%" OR icd10_code LIKE "E10%" OR 
         icd10_code LIKE "E11%" OR icd10_code LIKE "E13%")

    UNION ALL

    SELECT DISTINCT pat_mrn_id
    FROM condition_occurrence
    INNER JOIN concept_ancestor ON condition_concept_id = descendant_concept_id
    INNER JOIN 1_covid_patient2person using (person_id)
    WHERE ancestor_concept_id = 443730
) AS diabetic_neuropathy_patients
zietzm commented 4 years ago

Diabetic vasculopathy

Using a single diagnosis of any descendant of SNOMED CT 421895002 (Peripheral vascular disorder due to diabetes mellitus) or ICD-10 CM terms containing "peripheral angiopathy" or "circulatory complications" within the ICD-10 CM diabetes mellitus range (E08, E09, E10, E11, E13). This corresponds to suffixes 51, 52, and 59 in the diabetes mellitus range of ICD-10 CM.

SELECT DISTINCT pat_mrn_id
FROM (
    SELECT DISTINCT pat_mrn_id
    FROM 1_covid_patients_noname 
    WHERE date_retrieved = "{date}" AND 
        REPLACE(icd10_code, ",", "") REGEXP "^E(08|09|10|11|13).(51|52|59)$"

    UNION ALL

    SELECT DISTINCT pat_mrn_id
    FROM condition_occurrence
    INNER JOIN concept_ancestor ON condition_concept_id = descendant_concept_id
    INNER JOIN 1_covid_patient2person using (person_id)
    WHERE ancestor_concept_id = 321822
) AS diabetic_retinopathy_patients
zietzm commented 4 years ago

Information moved to the wiki.

Closing this issue